Amazon Aurora 是AWS完全托管的、与MySQL和PostgreSQL 兼容的关系数据库,是 AWS 增长最快的服务之一。鉴于Aurora的设计理念、性能、存储、扩展性、可靠性方面的突出表现,越来越多的客户选择 Aurora 作为核心关系型数据库,处理核心数据业务。本文分析了游戏场景中使用Aurora数据库,调整一个参数值会对数据库系统负载的影响以及成本的影响,提出优化建议。
Aurora成本分析
实例成本
实例价格上,Aurora大约是RDS Mysql的1.2倍。考虑到Aurora相比Mysql的性能优势,在高并发业务场景下,即使Aurora相对Mysql只提升了2倍的性能,成本也会降低很多。例如,Mysql需要4个r4.xlarge节点,Aurora由于性能提升,只需要2个r4.xlarge即可满足业务需求。
总成本上,Aurora更便宜
Aurora 0.58*2=1.16
Mysql 0.48*4=1.92
存储成本
存储价格上,Aurora按实际容量收费,对比RDS在创建实例时就要指定存储容量,价格会更便宜。
IO成本
Mysql Provision IOPS EBS (IO1) 按照预设容量收费,此费用固定。Aurora按照IO总量计算,不同时段的IO请求不一样,费用也不同。实际在高并发业务场景的使用过程中,此项的成本很高。
优化参数介绍
参考mysq的官方文档
https://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_innodb_flush_log_at_trx_commit
innodb_flush_log_at_trx_commit
简单的说
0:由mysql的main_thread每秒将存储引擎log buffer中的redo日志写入到log file,并调用文件系统的sync操作,将日志刷新到磁盘。
1:每次事务提交时,将存储引擎log buffer中的redo日志写入到log file,并调用文件系统的sync操作,将日志刷新到磁盘。
2:每次事务提交时,将存储引擎log buffer中的redo日志写入到log file,并由存储引擎的main_thread 每秒将日志刷新到磁盘。
取值为1是最安全的。如果取值为0,当数据库程序崩溃的时候,可能会丢失1秒左右的数据。如果取值为2,当操作系统崩溃或者硬件故障的时候会丢失1秒左右数据。
innodb_flush_log_at_trx_commit参数在AWS Aurora中属于cluster参数组,可以动态修改。如果把aurora的cluster参数组由默认参数组换成自定义参数组的时候,可能需要手动重启aurora的cluster数据库。
参数优化测试
服务器采用EC2的C5.large,数据库采用Aurora的R5.large兼容mysql 5.6版本,VolumnWriteIOPS采用CloudWatch API进行采集。特别需要注意的是,数据库不做任何操作,也会有VolumnWriteIOPS,数据量大致是 15000-18000次/小时,同时VolumnWriteIOPS并不是和操作完全同步。这些因素会导致测试的数据有一定偏差。
游戏业务操作场景
游戏服务器一般包含两个线程组,分别是Request Thread Group和后台线程Schedule Group,每一个Request的处理和Schedule event的处理都由数据库的Transaction来保护。本次测试主要评估空Transaction(只有begin和commit,没有任何实质sql)的情况,以便了解Transaction机制的时间消耗以及flush参数的相关影响。
无论flush的参数是1还是2,VolumnWriteIOPS都可以忽略不记。从上述测试看,flush参数值对于相关表现基本上没有任何影响。
游戏日志记录场景
游戏日志现在的实现是采用单线程异步方式用insert sql把数据插入到数据表,不使用transaction。每一个游戏服进程有4-5个日志线程,负责不同的日志记录,10个左右的游戏服共享一个物理数据库。
从这个数据看,flush参数由1改成2以后,能够大幅度的增加服务器处理的吞吐量,同时减少WriteIOPS的数量。
参数优化实战
完成参数切换,重点观察切换的时候数据库CPU、Load以及切换前后每天的WriteIOPS对比。从结果看,这个参数的修改对两个数据库都有明显影响,其中对日志库的影响更加显著。
数据库负载下降
更新之前的aurora_redo_log_flush数据占比接近20%
更新之后的aurora_redo_log_flush数据库占比为0,对于整体负载能力有10-20%提升。
CPU使用率下降
cpu使用率直接降低了一半。
WriteIOPS明显下降
2020.07.16进行的参数修改,对比2020.07.17的数据和2020.07.10到2020.07.15的数据,对于日志库,WriteIOPS降到了原来的1/5,游戏库差不多降低了一半。
优化建议
考虑行业的特点,以及Aurora本身的故障率,1秒的数据丢失风险如果能够接受,可以考虑优化innodb_flush_log_at_trx_commit参数的值为2,实现数据库IO请求数的减少,还可以考虑实例类型减配的操作,进一步达到成本优化的目标。
本文部分内容引用来自亚马逊AWS官方博客:
https://aws.amazon.com/cn/blogs/china/aurora-mysql-redshift-application-scene-cost-analysis/
如需进一步协助或服务,请留言,泰岳云业务会提供自动化工具及专业服务。
Amazon Aurora is a fully managed relational database compatible with MySQL and PostgreSQL, and is one of AWS's fastest growing services. Given Aurora's outstanding performance in design philosophy, performance, storage, scalability, and reliability, more and more customers choose Aurora as their core relational database to handle core data operations. This article analyzes using Aurora database in gaming scenarios, how adjusting a parameter value affects database system load and cost impact, and provides optimization recommendations.
Aurora Cost Analysis
Instance Cost
In terms of instance pricing, Aurora is approximately 1.2 times that of RDS MySQL. Considering Aurora's performance advantage over MySQL, in high-concurrency business scenarios, even if Aurora only improves performance by 2x compared to MySQL, costs will be significantly reduced. For example, MySQL requires 4 r4.xlarge nodes, while Aurora, due to performance improvements, only needs 2 r4.xlarge to meet business requirements.
In total cost, Aurora is cheaper
Aurora 0.58*2=1.16
MySQL 0.48*4=1.92
Storage Cost
In terms of storage pricing, Aurora charges based on actual capacity. Compared to RDS which requires specifying storage capacity when creating an instance, it's cheaper.
IO Cost
MySQL Provision IOPS EBS (IO1) charges based on preset capacity, this fee is fixed. Aurora calculates based on total IO, with different IO requests at different times, resulting in different fees. In actual high-concurrency business scenarios, this cost is quite high.
Optimization Parameter Introduction
Reference MySQL official documentation
https://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_innodb_flush_log_at_trx_commit
innodb_flush_log_at_trx_commit
Simply put:
0: MySQL's main_thread writes redo logs from the storage engine log buffer to log file every second, and calls the file system's sync operation to flush logs to disk.
1: At each transaction commit, redo logs from the storage engine log buffer are written to log file, and the file system's sync operation is called to flush logs to disk.
2: At each transaction commit, redo logs from the storage engine log buffer are written to log file, and the storage engine's main_thread flushes logs to disk every second.
A value of 1 is the safest. If the value is 0, when the database program crashes, approximately 1 second of data may be lost. If the value is 2, approximately 1 second of data may be lost when the operating system crashes or there's a hardware failure.
The innodb_flush_log_at_trx_commit parameter in AWS Aurora belongs to the cluster parameter group and can be dynamically modified. If you change Aurora's cluster parameter group from the default parameter group to a custom parameter group, you may need to manually restart the Aurora cluster database.
Parameter Optimization Testing
The server uses EC2 C5.large, the database uses Aurora R5.large compatible with MySQL 5.6, and VolumnWriteIOPS is collected using CloudWatch API. It's important to note that even without any database operations, there will be VolumnWriteIOPS, approximately 15000-18000 times/hour. Also, VolumnWriteIOPS is not completely synchronized with operations. These factors cause some deviation in test data.
Gaming Business Operation Scenario
Game servers generally contain two thread groups: Request Thread Group and background thread Schedule Group. Each Request processing and Schedule event processing is protected by database Transaction. This test mainly evaluates empty Transaction (only begin and commit, no actual SQL) to understand the time consumption of Transaction mechanism and the impact of flush parameter.
Whether the flush parameter is 1 or 2, VolumnWriteIOPS can be ignored. From the above test, the flush parameter value has basically no impact on related performance.
Game Log Recording Scenario
Game logs are currently implemented using single-threaded asynchronous insert SQL to insert data into data tables, without using transactions. Each game server process has 4-5 log threads responsible for different log recordings, and about 10 game servers share one physical database.
From this data, after changing the flush parameter from 1 to 2, server processing throughput can be significantly increased while reducing WriteIOPS count.
Parameter Optimization in Practice
After completing parameter switching, focus on observing database CPU, Load during switching, and comparing daily WriteIOPS before and after switching. From the results, this parameter modification has obvious impact on both databases, with more significant impact on the log database.
Database Load Reduction
Before update, aurora_redo_log_flush data ratio was close to 20%
After update, aurora_redo_log_flush database ratio is 0, improving overall load capacity by 10-20%.
CPU Usage Reduction
CPU usage directly reduced by half.
Significant WriteIOPS Reduction
Parameter modification was done on 2020.07.16. Comparing data from 2020.07.17 with data from 2020.07.10 to 2020.07.15, for the log database, WriteIOPS dropped to 1/5 of the original, and the game database dropped by about half.
Optimization Recommendations
Considering industry characteristics and Aurora's own failure rate, if the risk of 1 second data loss is acceptable, consider optimizing the innodb_flush_log_at_trx_commit parameter value to 2 to reduce database IO requests. You can also consider downgrading instance types to further achieve cost optimization goals.
Part of this article references the Amazon AWS Official Blog:
https://aws.amazon.com/cn/blogs/china/aurora-mysql-redshift-application-scene-cost-analysis/
For further assistance or services, please leave a message. Taiyue Cloud Business provides automation tools and professional services.