本文详细介绍了如何在AWS上实现跨区域的Pilot Light灾备方案。除了VPC预配置和RDS热备外,其余所有组件都是灾难发生后通过脚本动态创建,达到最小的基础设施成本。
前提假设
该方案模拟一个 WordPress cluster 部署在 AWS 上进行 multi-region pilot light 灾备。生产区域组件:
- RDS MySQL:WordPress 数据库
- ElastiCache Redis:缓存层
- S3:WordPress 文件存储,通过 S3FS 挂载到 EC2
- EC2 + Auto Scaling Group:WordPress 应用层
- ELB:负载均衡
- NAT Gateway:出网访问
架构
解决方案概述
RDS MySQL 数据备份
配置 RDS Cross Region Replica,实现数据库的异步复制。灾难发生后,通过提升只读副本为独立的数据库实例,使其能够执行正常的写入操作。
S3 Bucket 数据备份
开启 S3 Cross Region Replication,实现 S3 文件的跨 Region 自动复制。
应用镜像
将北京区域新建的 AMI 复制到宁夏区域,可手动触发或通过 CloudWatch Event + Lambda 自动化。
灾备脚本
在 GitHub aws-dr-samples 上提供了基于 Terraform 的可执行脚本。灾难发生后,按以下顺序进行灾难恢复:
- 手动提升 RDS 只读副本为独立的数据库实例
- 执行脚本创建容灾集群
- 进行健康检查,确定容灾集群能够正常运行
- 执行 DNS 切换,把用户访问切换到容灾集群
价格(宁夏区,参考)
| 服务 | 类型 | 单价 | 年费用(元) |
| AMI 20G | | 0.277/GB/月 | 66.48 |
| S3 不频繁访问 1T | | 0.1030/GB/月 | 1265.7 |
| RDS Read Replica | db.m4.large 单AZ | 1.1733/h | 5540 |
| 跨区域流量 2T | | 0.6003/GB | 1229.41 |
| 含税合计 | | | 8587.69 |
注意事项
- S3 Cross Region Replication 是异步复制,大多数对象会在 15 分钟内复制
- RDS MySQL Cross Region Replication 采用异步复制,Replica 和 Master 之间的 lag 取决于事务大小和 Master 负荷
- 容灾方案中的 Redis 节点是冷启动,需注意数据库瞬间读写冲击,建议在应用代码中实施熔断机制
- 如果在北京区域对 EC2 进行了变更,请及时把北京区域的 AMI 复制到宁夏区域
总结
本文详细介绍了如何在 AWS 上实现跨区域的 Pilot Light 灾备方案,并提供了 Terraform 模板进行可行性验证。除了 VPC 预配置和 RDS 热备外,其余所有组件都是灾难发生后通过脚本动态创建,达到最小的基础设施成本。
本文转载亚马逊云科技官方博客 《企业备份&容灾系列 – AWS 多区域 Pilot Light 容灾设计》
返回技术博客
This article details how to implement a cross-region Pilot Light disaster recovery solution on AWS. Aside from pre-configuring the VPC and keeping an RDS hot standby, all other components are dynamically created via scripts when a disaster occurs, minimizing infrastructure costs.
Assumptions
This solution simulates a WordPress cluster on AWS with multi-region Pilot Light DR. Production region components:
- RDS MySQL: WordPress database
- ElastiCache Redis: Caching layer
- S3: WordPress file storage, mounted to EC2 via S3FS
- EC2 + Auto Scaling Group: WordPress application tier
- ELB: Load balancer
- NAT Gateway: Outbound internet access
Architecture
Solution Overview
RDS MySQL Data Replication
Configure RDS Cross-Region Read Replica for asynchronous database replication. When a disaster occurs, promote the read replica to a standalone instance to enable write operations.
S3 Bucket Data Backup
Enable S3 Cross-Region Replication to automatically replicate S3 files across regions.
Application AMI
Copy newly created AMIs from the primary region to the DR region — either manually or automatically via CloudWatch Events + Lambda.
DR Scripts
Terraform-based scripts are available on GitHub: aws-dr-samples. Recovery sequence after a disaster:
- Manually promote the RDS read replica to a standalone database instance
- Run the script to create the DR cluster
- Perform health checks to confirm the DR cluster is running correctly
- Switch DNS to redirect user traffic to the DR cluster
Cost Estimate (Ningxia Region, Reference Only)
| Service | Type | Unit Price | Annual Cost (CNY) |
| AMI 20GB | | ¥0.277/GB/month | 66.48 |
| S3 Infrequent Access 1TB | | ¥0.1030/GB/month | 1265.7 |
| RDS Read Replica | db.m4.large Single-AZ | ¥1.1733/h | 5540 |
| Cross-region traffic 2TB | | ¥0.6003/GB | 1229.41 |
| Total incl. tax | | | 8587.69 |
Important Notes
- S3 Cross-Region Replication is asynchronous; most objects replicate within 15 minutes
- RDS MySQL Cross-Region Replication is asynchronous; lag depends on transaction size and master load
- Redis starts cold in the DR environment — implement circuit breakers to handle the initial database load spike
- If EC2 changes are made in the primary region, copy the updated AMI to the DR region promptly
Summary
This article details how to implement a cross-region Pilot Light DR solution on AWS with Terraform templates for validation. Aside from VPC pre-configuration and RDS hot standby, all other components are dynamically created via scripts, minimizing infrastructure costs.
Source: AWS Official Blog
Back to Tech Blog