使用 CloudWatch 可以轻松监控 AWS 资源和应用程序。它与 Amazon EC2、Amazon DynamoDB、Amazon S3、Amazon ECS、Amazon EKS 和 AWS Lambda 等 70 多种 AWS 服务原生集成.
利用CloudWatch可以深入了解相关内容的日志,比如监控每台EC2实例的CPU使用率、磁盘吞吐量、网络带宽等信息,同时我们可以对每个资源的监控指标设置警告,及时发现资源运行中的异常。
如果我们有几十上百个甚至更多的实例,为每一个实例分别设置告警的工作变得非常困难,那么能否设置一个警告,对所有实例的相应监控指标进行统一监控呢?答案当然是肯定的,
我们可以通过对聚合统计数据设置告警来实现以上需求。
接下来我们将创建一个聚合统计数据告警,监控所有EC2的CPU利用率,当任何实例的CPU利用率超过90%则会告警。
步骤如下:
1、开启详细监控生成聚合统计数据
每个监控指标都有不同的维度,只有开启EC2的详细监控的实例才能以维度生成聚合统计数据,默认情况下EC2的监控间隔为5分钟,开启详细监控后监控间隔会缩短到1分钟,可以在创建EC2时开启详细监控,也可以在实例的监控界面修改既有实例的监控粒度。开启方式如图:
创建EC2时开启详细监控
开启既有实例的详细监控
2、创建警报
Cloudwatch---警报---点击创建警报---选择指标---EC2---跨所有实例---CPU利用率—选择指标
创建警报
选择指标
注意要将统计数据修改为最大值,默认为平均值。 你也可以根据自己的需求设置为最小值或P99、P95等。
修改统计方法
设置告警阈值为80%
配置通知方式---设置名称---下一步---创建警报
设置名称
创建警报
完成
3、测试效果
我们创建两台实例,对其中的一台进行压力测试,将CPU利用率占用到100%。
观察告警情况。
加压的实例
没有压力的实例
我们看到警报已经被成功触发了。
CloudWatch makes it easy to monitor AWS resources and applications. It integrates natively with over 70 AWS services including Amazon EC2, Amazon DynamoDB, Amazon S3, Amazon ECS, Amazon EKS, and AWS Lambda.
With CloudWatch, you can gain insights into relevant logs, such as monitoring CPU utilization, disk throughput, and network bandwidth for each EC2 instance. At the same time, we can set alarms for each resource's monitoring metrics to detect anomalies in resource operation in a timely manner.
If we have dozens, hundreds, or even more instances, setting up alarms for each instance individually becomes very difficult. So can we set up a single alarm to uniformly monitor the corresponding metrics for all instances? The answer is of course yes.
We can achieve this by setting alarms on aggregated statistics.
Next, we will create an aggregated statistics alarm to monitor CPU utilization of all EC2 instances. When any instance's CPU utilization exceeds 90%, an alarm will be triggered.
Steps:
1. Enable Detailed Monitoring to Generate Aggregated Statistics
Each monitoring metric has different dimensions. Only instances with EC2 detailed monitoring enabled can generate aggregated statistics by dimension. By default, EC2 monitoring interval is 5 minutes. After enabling detailed monitoring, the monitoring interval is reduced to 1 minute. You can enable detailed monitoring when creating an EC2 instance, or modify the monitoring granularity of existing instances in the instance's monitoring interface. The method is shown in the figure:
Enable detailed monitoring when creating EC2
Enable detailed monitoring for existing instances
2. Create Alarm
CloudWatch --- Alarms --- Click Create Alarm --- Select Metric --- EC2 --- Across All Instances --- CPU Utilization — Select Metric
Create alarm
Select metric
Note: Change the statistic to Maximum, the default is Average. You can also set it to Minimum, P99, P95, etc. according to your needs.
Modify statistics method
Set alarm threshold to 80%
Configure notification method --- Set name --- Next --- Create alarm
Set name
Create alarm
Complete
3. Test Effect
We create two instances and perform a stress test on one of them to push CPU utilization to 100%.
Observe the alarm situation.
Instance under stress
Instance without stress
We can see that the alarm has been successfully triggered.