本文详细介绍了如何在AWS上实现跨区域的冷备方案,可以在不增加额外闲置成本的情况下恢复因区域故障导致的生产宕机。除了网络基础架构预配置外,只做数据定期快照向灾备环境拷贝,其余所有组件都是灾难发生后通过脚本动态创建。
前提假设
该方案模拟一个 WordPress cluster 部署在 AWS 上进行 multi-region cold backup 灾备。生产区域组件:
- RDS MySQL:WordPress 数据库
- ElastiCache Redis:缓存层
- S3:WordPress 文件存储,通过 S3FS 挂载到 EC2
- EC2 + Auto Scaling Group:WordPress 应用层
- ELB:负载均衡
- NAT Gateway:出网访问
灾备架构
解决方案概述
数据库:Snapshot 备份
通过 CloudWatch Rule 定时触发 Lambda,调用 AWS API 定时进行数据库快照拍摄,并将新创建的 Snapshot 自动拷贝到灾备区域。
应用层:AMI 备份
当北京区域新建 AMI 后,可以按时对实例进行快照拍摄,并将新建的 AMI 复制到宁夏区域。可手动触发或通过 CloudWatch Event + Lambda 自动化整个过程。
媒体文件:S3 跨区域备份
开启 S3 Cross Region Replication,实现 S3 文件的跨 Region 自动复制。S3 通过 S3FS 挂载到 EC2 作为 WordPress 媒体文件库。
灾备脚本
在 GitHub aws-dr-samples repo 上提供了基于 Terraform 的可执行脚本,帮助用户快速构建灾备环境。
灾难发生后,按以下顺序进行灾难恢复:
- 手动将 RDS 数据库快照还原为数据库实例
- 执行脚本创建容灾集群
- 进行健康检查,确定容灾集群能够正常运行
- 执行 DNS 切换,把用户访问切换到容灾集群
价格(宁夏区,参考)
| 服务 | 单价 | 年费用(元) |
| AMI 20G | 0.277/GB/月 | 66.48 |
| S3 不频繁访问 1T | 0.1030/GB/月 | 1265.7 |
| 跨区域流量 200G | 0.6003/GB | 120.06 |
| 含税合计 | | 1539.37 |
注意事项
- S3 Cross Region Replication 是异步复制,大多数对象会在 15 分钟内复制,但极少数情况下可能需要更长时间
- 容灾方案中的 Redis 节点是冷启动,启动后内存中没有缓存数据,需注意数据库瞬间读写冲击
- 如果在北京区域对 EC2 进行了变更,请及时把北京区域的 AMI 复制到宁夏区域
总结
本文详细介绍了如何在 AWS 上实现跨区域的 Cold Backup 灾备方案,并提供了 Terraform 模板进行可行性验证。利用本文所述方法,在主区域发生故障后,自动化脚本会自动执行,完成新区域整套环境的启动,以最大程度节省成本。
返回技术博客
This article details how to implement a cross-region cold backup disaster recovery solution on AWS — enabling recovery from a regional outage without incurring additional idle costs. Aside from pre-configuring the network infrastructure, only periodic data snapshots are copied to the DR environment. All other components are dynamically created via scripts when a disaster occurs.
Assumptions
This solution simulates a WordPress cluster deployed on AWS with multi-region cold backup DR. Production region components:
- RDS MySQL: WordPress database
- ElastiCache Redis: Caching layer
- S3: WordPress file storage, mounted to EC2 via S3FS
- EC2 + Auto Scaling Group: WordPress application tier
- ELB: Load balancer
- NAT Gateway: Outbound internet access
DR Architecture
Solution Overview
Database: Snapshot Backup
A CloudWatch Rule triggers a Lambda function on a schedule to take RDS snapshots and automatically copy new snapshots to the DR region.
Application Layer: AMI Backup
When a new AMI is created in the primary region, it is periodically snapshotted and copied to the DR region. This can be done manually or automated via CloudWatch Events + Lambda.
Media Files: S3 Cross-Region Replication
Enable S3 Cross-Region Replication to automatically replicate S3 files across regions. S3 is mounted to EC2 via S3FS as the WordPress media library.
DR Scripts
Terraform-based executable scripts are available on GitHub: aws-dr-samples to help users quickly build the DR environment.
When a disaster occurs, follow this recovery sequence:
- Manually restore the RDS database snapshot to a new database instance
- Run the script to create the DR cluster
- Perform health checks to confirm the DR cluster is running correctly
- Switch DNS to redirect user traffic to the DR cluster
Cost Estimate (Ningxia Region, Reference Only)
| Service | Unit Price | Annual Cost (CNY) |
| AMI 20GB | ¥0.277/GB/month | 66.48 |
| S3 Infrequent Access 1TB | ¥0.1030/GB/month | 1265.7 |
| Cross-region transfer 200GB | ¥0.6003/GB | 120.06 |
| Total incl. tax | | 1539.37 |
Important Notes
- S3 Cross-Region Replication is asynchronous. Most objects replicate within 15 minutes, but in rare cases it may take longer
- Redis in the DR environment starts cold with no cached data — implement circuit breakers in your application to handle the initial database load spike
- If EC2 changes are made in the primary region, copy the updated AMI to the DR region promptly to keep it current
Summary
This article details how to implement a cross-region cold backup DR solution on AWS, with Terraform templates for validation. Using this approach, when the primary region fails, automated scripts launch the full environment in the DR region, minimizing both recovery time and idle costs.
Back to Tech Blog