Amazon Web Services的用户大多知道,EC2的类型很多。其中m5d、r5d等类型的,跟对应的m5、r5系列有什么区别,知道的人可能少一些。不过看了本文标题也会猜到,这些机型是带有一定容量的本地存储的。按aws不太接地气的用词,叫做实例存储。
大家知道,EBS其实不是EC2所在服务器的本地盘,而是从大的分布式存储中划出来、通过网络连到EC2所在服务器的,八成是采用iSCSI等协议映射成EC2的块设备的。好处是独立于特定服务器,这样EC2才可以在不同服务器、甚至不同的机房和机楼之间方便地飘移。
凡事有利就有弊。大家知道,从磁盘上读写大文件有时很占资源,衡量存储性能的最主要指标是吞吐量和IOPS。EBS模式下这俩指标不只取决于磁盘本身、也受网络带宽和波动的影响,而且是受最差的影响。
而所谓的实例存储,就是EC2所在服务器上插着的磁盘,所以性能表现不受网络影响。但是EC2飘走后磁盘带不走,这就是这类存储关机就丢失的内在原因,因为下次开机不见得还跑在原来的服务器上。
这个道理专业IT人员容易理解。一般情况下,EBS跟服务器之间是高速专有网络,网络不一定会成为存储性能瓶颈。所以大多数实际使用场景中,用户对两者的性能差别没什么感觉。但是笔者最近调测一个程序,需要多次顺序读写和处理好几T的文件,执行一遍要等半天,比较痛苦。而且同样数据、同样程序,不同时候执行感觉时快时慢的。这个EC2是东京区域的,初步怀疑受到了同AZ内其他租户业务忙闲的影响。
于是我修改了实例类型,从r5.large改成了r5d.large, 并在操作系统内对本地磁盘创建分区、文件系统,并挂载到程序访问的数据目录。再次跑程序,发现性能还是有明显改善的。
从价格看,带d的系列比不带d的贵不了多少。我的程序输入输出数据都在S3上,本机不需要永久保存数据,EC2跑完程序就会关机。这种情况下采用带d的机型,不但可以提高性能、缩短运行时间,而且关机后也无需为数据盘付费,是个少有的双份好处、没有坏处的选择,推荐给大家参考。
实例存储与 EBS 之间有何区别?
实例存储(Amazon EC2 instance store)User Guide
Most Amazon Web Services users know that there are many EC2 types. However, fewer people may know the difference between m5d, r5d types and their corresponding m5, r5 series. But after reading this article's title, you might guess that these instance types come with a certain capacity of local storage. In AWS's less down-to-earth terminology, this is called instance store.
As we all know, EBS is not actually a local disk on the server where EC2 resides, but rather allocated from a large distributed storage system and connected to the EC2 server through the network, most likely using protocols like iSCSI to map to EC2's block devices. The benefit is independence from specific servers, allowing EC2 to easily migrate between different servers, or even different data centers and server rooms.
Everything has its pros and cons. As we know, reading and writing large files from disks sometimes consumes significant resources. The main metrics for measuring storage performance are throughput and IOPS. In EBS mode, these two metrics depend not only on the disk itself but are also affected by network bandwidth and fluctuations—and by the worst impact at that.
The so-called instance store is the disk plugged into the server where EC2 resides, so performance is not affected by the network. However, the disk cannot be taken away when EC2 migrates, which is the intrinsic reason why this type of storage is lost when powered off—because the next startup may not run on the original server.
Professional IT personnel can easily understand this principle. Generally, the high-speed private network between EBS and servers means the network may not become a storage performance bottleneck. So in most actual use cases, users don't feel much performance difference between the two. However, I recently tested a program that needed to sequentially read/write and process several terabytes of files multiple times. Running it once took half a day, which was quite painful. And with the same data and same program, execution felt sometimes fast and sometimes slow at different times. This EC2 was in the Tokyo region, and I initially suspected it was affected by the busy/idle status of other tenants in the same AZ.
So I modified the instance type from r5.large to r5d.large, and in the operating system created partitions and file systems on the local disk, then mounted it to the data directory accessed by the program. Running the program again, I found the performance had noticeably improved.
In terms of price, the d series isn't much more expensive than the non-d series. My program's input and output data are both on S3, and the local machine doesn't need to permanently store data—the EC2 shuts down after running the program. In this case, using the d-series instance type not only improves performance and shortens runtime, but also requires no payment for data disks after shutdown. It's a rare choice with double benefits and no downsides—recommended for everyone's reference.
What's the difference between instance store and EBS?
Instance Store (Amazon EC2 instance store) User Guide