Amazon CloudWatch

Amazon CloudWatch is a monitoring and management service built for developers, system operators, site reliability engineers (SRE), and IT managers. Amazon CloudWatch provides  data and actionable insights to monitor users applications, understand and respond to system-wide performance changes, optimize resource utilization, and get a unified view of operational health. Amazon CloudWatch collects monitoring and operational data in the form of logs, metrics, and events, providing a unified view of AWS resources, applications and services that run on AWS, and on-premises servers. Users can use CloudWatch to set high resolution alarms, visualize logs and metrics side by side, take automated actions, troubleshoot issues, and discover insights to optimize the applications, and ensure they are running smoothly.

  • The Amazon CloudWatch home page automatically displays metrics about every AWS service. Users can additionally create custom dashboards to display metrics about users custom applications, and display custom collections of metrics that users choose.
  • Users can create alarms that watch metrics and send notifications or automatically make changes to the resources users are monitoring when a threshold is breached. Users can also use this data to stop under-used instances to save money.
  • With Amazon CloudWatch, users gain system-wide visibility into resource utilization, application performance, and operational health.

Amazon CloudWatch

Amazon CloudWatch Benefits

Set alarms on any of the metrics to send a notifications or take other automated actions. For example, when a specific Amazon EC2 metric crosses your alarm threshold, users can use Auto Scaling to dynamically add or remove EC2 instances or send users a notification. Amazon CloudWatch Events provides a stream of events describing changes to your AWS resources. Users can easily build workflows that automatically take actions users define, such as stopping an Amazon EC2 instance, sending an Amazon SNS message, or adding a message to the Amazon SQS Queue, when an event of interest occurs.  

Submit Custom Metrics generated by users own applications via a simple API request and have them monitored by Amazon CloudWatch. Users can send and store metrics that are important to the application’s operational performance to help troubleshoot and spot trends. Users can use CloudWatch Logs to monitor and troubleshoot the systems and applications using the existing system, application, and custom log files. Users can send your existing system, application, and custom log files to Amazon CloudWatch Logs and monitor these logs in near real-time. 

 

View metrics for CPU utilization, data transfer, and disk usage activity from Amazon EC2 instances (Basic Monitoring) for no additional charge. For an additional charge, CloudWatch provides Detailed Monitoring for EC2 instances with higher resolution and metric aggregation. No additional software needs to be installed. Monitor metrics on Amazon DynamoDB tables, Amazon EBS volumes, Amazon RDS DB instances, Amazon Elastic MapReduce job flows, Elastic Load Balancers, Amazon SQS queues, Amazon SNS topics, and more for no additional charge. No additional software needs to be installed. 

Users can use CloudWatch Logs to store log data in highly durable storage. The Amazon CloudWatch Logs agent makes it easy to quickly send both rotated and non-rotated log data off of a host and into the log service. Users can then access the raw log data when users need it. By default, logs are kept indefinitely and never expire. Users can adjust the retention policy for each log group, keeping the indefinite retention, or choosing a retention period between 10 years and one day. Users can use CloudWatch Logs to log information about the DNS queries
that Route 53 receives.

Amazon CloudWatch Features

Monitor Amazon EC2: Users can use CloudWatch Logs to monitor applications and systems using log data. CloudWatch Logs uses log data for monitoring; so, no code changes are required. Users can monitor application logs for specific literal terms or count the number of occurrences of a literal term at a particular position in log data (such as “404” status codes in an Apache access log). When the term you are searching for is found, Amazon CloudWatch Logs reports the data to a CloudWatch metric that you specify. Log data is encrypted while in transit and while it is at restMonitor EC2 instances automatically, without installing additional software:

  • Basic Monitoring for Amazon EC2 instances: Seven pre-selected metrics at five-minute frequency and three status check metrics at one-minute frequency, for no additional charge.
  • Detailed Monitoring for Amazon EC2 instances: All metrics available to Basic Monitoring at one-minute frequency, for an additional charge. Instances with Detailed Monitoring enabled allows data aggregation by Amazon EC2 AMI ID and instance type.

When using Auto Scaling or Elastic Load Balancing, Amazon CloudWatch will also provide Amazon EC2 instance metrics aggregated by Auto Scaling group and by Elastic Load Balancer, regardless of whether users have chosen Basic or Detailed Monitoring. Monitoring data is retained for two weeks, even if the AWS resources have been terminated. This enables users to quickly look back at the metrics preceding an event of interest. Basic Monitoring is already enabled automatically for all Amazon EC2 instances, and users can access these metrics in either the Amazon EC2 tab or the Amazon CloudWatch tab of the AWS Management Console, or by using the Amazon CloudWatch API.

Monitor Other AWS Resources: Amazon CloudWatch automatically monitors Elastic Load Balancers for metrics such as request count and latency; Amazon EBS volumes for metrics such as read/write latency; Amazon RDS DB instances for metrics such as freeable memory and available storage space; Amazon SQS queues for metrics such as number of messages sent and received; and Amazon SNS topics for metrics such as number of messages published and delivered. No additional software needs to be installed to monitor other AWS resources. The following supported AWS resources:

  • Auto Scaling groups: Seven pre-selected metrics at one-minute frequency, optional and for no additional charge.
  • Elastic Load Balancers: Thirteen pre-selected metrics at one-minute frequency, for no additional charge.
  • AWS Storage Gateways: Eleven pre-selected gateway metrics and five pre-selected storage volume metrics at five-minute frequency, for no additional charge.
  • Amazon DynamoDB tables: Seven pre-selected metrics at five-minute frequency, for no additional charge.
  • Amazon ElastiCache nodes: Thirty-nine pre-selected metrics at one-minute frequency, for no additional charge.
  • Amazon RDS DB instances: Fourteen pre-selected metrics at one-minute frequency, for no additional charge.
  • Other resources includes Amazon Elastic MapReduce job flows, Amazon Redshift, Amazon SNS topics, Amazon SQS queues, Amazon CloudWatch Logs, Amazon EBS PIOPS (SSD) volumes, Amazon EBS General Purpose (SSD) volumes,Amazon EBS Magnetic volumes,  Amazon Route 53 health checks, and Estimated charges on the AWS bill

Contributor Insights: Amazon CloudWatch includes Contributor Insights, which analyzes time-series data to provide a view of the top contributors influencing system performance. Once set up, Contributor Insights runs continuously without needing additional user intervention. This helps developers and operators more quickly isolate, diagnose, and remediate issues during an operational event. Contributor Insights helps understand who or what is impacting the system and application performance, such as a specific resource, customer account, or API call. This enables users to pinpoint outliers, find the heaviest traffic patterns, and rank the most utilized system processes.

  • Users can create Contributor Insights rules to evaluate patterns in structured log events as they are sent to CloudWatch Logs, including logs from AWS services like AWS CloudTrail, Amazon Virtual Private Cloud, Amazon API Gateway, and any custom logs sent by your service or on-premises servers, such as Apache access logs.
  • Contributor Insights will evaluate these log events in real-time and display reports that show the top contributors and number of unique contributors in a dataset. A contributor is an aggregate metric based on dimensions contained as log fields in Amazon CloudWatch Logs, such as account-id or interface-id in VPC Flow Logs, or any other custom set of dimensions.
  • Users can sort and filter contributor data based on own custom criteria. Contributor Insights report data can be displayed on Amazon CloudWatch dashboards, graphed alongside CloudWatch metrics, and added to CloudWatch alarms.

View Graphs and Statistics: With Amazon CloudWatch dashboards, users can create re-usable dashboards which allow them to monitor the AWS resources in one location. Metric data is kept for a period of two weeks enabling users to view up to the minute data and also historical data.

Synthetics: Amazon CloudWatch Synthetics allows you to monitor application endpoints more easily. It runs tests on your endpoints every minute, 24×7, and alerts users as soon as the application endpoints don’t behave as expected. These tests can be customized to check for availability, latency, transactions, broken or dead links, step by step task completions, page load errors, load latencies for UI assets, complex wizard flows, or checkout flows in the applications. Users can also use Amazon CloudWatch Synthetics to isolate alarming application endpoints and map them back to underlying infrastructure issues to reduce mean time to resolution.

  • With this new feature, Amazon CloudWatch now collects canary traffic, which can continually verify end customer experience even when users don’t have any customer traffic on the applications, enabling users to discover issues before your customers do.
  • Amazon CloudWatch Synthetics supports monitoring of REST APIs, URLs, and website content, checking for unauthorized changes from phishing, code injection and cross-site scripting.

Lambda Insights: Amazon CloudWatch Lambda Insights simplifies the collection and aggregation of curated metrics and logs from AWS Lambda functions. It collects compute performance metrics such as CPU, memory, and network from each Lambda function as performance events, while automatically generating custom metrics used for monitoring and alarming. The performance events are ingested as CloudWatch logs to simplify monitoring and troubleshooting.

  • Amazon CloudWatch custom metrics are automatically extracted from these ingested logs and can be further analyzed using CloudWatch Logs Insights’ advanced query language. 

Monitor Custom Metrics: Submit Custom Metrics generated by your own applications (or by AWS resources not mentioned above) and have them monitored by Amazon CloudWatch. Users can submit these metrics to Amazon CloudWatch via a simple API request. All the same Amazon CloudWatch functionality will be available at up to one-minute frequency for users own custom metric data, including statistics, graphs, and alarms.

Monitor and Store Logs: Amazon CloudWatch Logs lets users monitor and troubleshoot the systems and applications using the existing system, application, and custom log files. With Amazon CloudWatch Logs, users can monitor your logs, in near real-time, for specific phrases, values or patterns (metrics). For example, users could set an alarm on the number of errors that occur in the system logs or view graphs of web request latencies from the application logs.

  • Users can view the original log data to see the source of the problem if needed. Log data can be stored and accessed for as long as users need using highly durable, low-cost storage

Set Alarms: Set alarms on any of the metrics to receive notifications or take other automated actions when users  metric crosses the specified threshold. Users can use alarms to detect and shut down Amazon EC2 instances that are unused or underutilized.

  • Users can also use Auto Scaling to add or remove Amazon EC2 instances dynamically based on the Amazon CloudWatch metrics.

Composite alarms: Amazon CloudWatch composite alarms allow users to combine multiple alarms and reduce alarm noise. If an application issue affects several resources in an application, users will receive a single alarm notification for the entire application instead of one for each affected service component or resource. This helps users stay focused on finding the root cause of operational issues to reduce application downtime. 

  • Users can provide an overall state for a grouping of resources like an application, AWS Region, or Availability Zone.

Amazon CloudWatch Concepts 

The following terminology and concepts are central to understanding and use of Amazon CloudWatch:

Namespaces

namespace is a container for Amazon CloudWatch metrics. Metrics in different namespaces are isolated from each other, so that metrics from different applications are not mistakenly aggregated into the same statistics. There is no default namespace. Users need to specify a namespace for each data point users publish to Amazon CloudWatch. Users can specify a namespace name when creating a metric. These names must contain valid XML characters, and be fewer than 256 characters in length. Possible characters are:

  • Alphanumeric characters (0-9A-Za-z), period (.), hyphen (-), underscore (_), forward slash (/), hash (#), and colon (:).

The AWS namespaces typically use the following naming convention: 

Metrics

Metrics are the fundamental concept in Amazon CloudWatch. A metric represents a time-ordered set of data points that are published to CloudWatch. Think of a metric as a variable to monitor, and the data points as representing the values of that variable over time. For example, the CPU usage of a particular EC2 instance is one metric provided by Amazon EC2. The data points themselves can come from any application or business activity from which users collect data.

  • By default, many AWS services provide free metrics for resources For a charge, users can also enable detailed monitoring for some resources, such as Amazon EC2 instances, or publish own application metrics. For custom metrics, users can add the data points in any order, and at any rate you choose. 
  • Metrics exist only in the Region in which they are created. Metrics cannot be deleted, but they automatically expire after 15 months if no new data is published to them. Data points older than 15 months expire on a rolling basis; as new data points come in, data older than 15 months is dropped.
  • Metrics are uniquely defined by a name, a namespace, and zero or more dimensions. Each data point in a metric has a time stamp, and (optionally) a unit of measure. Users can retrieve statistics from CloudWatch for any metric.
Time Stamps

Each metric data point must be associated with a time stamp. The time stamp can be up to two weeks in the past and up to two hours into the future. If users do not provide a time stamp, Amazon CloudWatch creates a time stamp for them based on the time the data point was received.

  • Time stamps are dateTime objects, with the complete date plus hours, minutes, and seconds. 
  • Although it is not required, AWS recommend that customers use Coordinated Universal Time (UTC). When retrieving statistics from Amazon CloudWatch, all times are in UTC.
  • Amazon CloudWatch alarms check metrics based on the current time in UTC. Custom metrics sent to Amazon CloudWatch with time stamps other than the current UTC time can cause alarms to display the Insufficient Data state or result in delayed alarms.
Metrics Retention

Amazon CloudWatch retains metric data as follows:

  • Data points with a period of less than 60 seconds are available for 3 hours. These data points are high-resolution custom metrics.
  • Data points with a period of 60 seconds (1 minute) are available for 15 days
  • Data points with a period of 300 seconds (5 minute) are available for 63 days
  • Data points with a period of 3600 seconds (1 hour) are available for 455 days (15 months)

Data points that are initially published with a shorter period are aggregated together for long-term storage. After 15 days this data is still available, but is aggregated and is retrievable only with a resolution of 5 minutes. After 63 days, the data is further aggregated and is available with a resolution of 1 hour.

Dimensions

dimension is a name/value pair that is part of the identity of a metric. Users can assign up to 10 dimensions to a metric. Every metric has specific characteristics that describe it, and think of dimensions as categories for those characteristics. Dimensions help users design a structure for the statistics plan. Because dimensions are part of the unique identifier for a metric, whenever adding a unique name/value pair to one of the metrics, users are creating a new variation of that metric.

  • AWS services that send data to Amazon CloudWatch attach dimensions to each metric. Users can use dimensions to filter the results that CloudWatch returns. 
  • For metrics produced by certain AWS services, such as Amazon EC2, Amazon CloudWatch can aggregate data across dimensions. CloudWatch does not aggregate across dimensions for your custom metrics.
Dimension Combinations

Amazon CloudWatch treats each unique combination of dimensions as a separate metric, even if the metrics have the same metric name. Users can only retrieve statistics using combinations of dimensions that specifically published. When retrieving statistics, specify the same values for the namespace, metric name, and dimension parameters that were used when the metrics were created. Users can also specify the start and end times for CloudWatch to use for aggregation.

Statistics

Statistics are metric data aggregations over specified periods of time. Amazon CloudWatch provides statistics based on the metric data points provided by users custom data or provided by other AWS services to CloudWatch. Aggregations are made using the namespace, metric name, dimensions, and the data point unit of measure, within the time period users specify. The following table describes the available statistics.

  • Minimum: The lowest value observed during the specified period. Users can use this value to determine low volumes of activity for the application.
  • Maximum: The highest value observed during the specified period. Users can use this value to determine high volumes of activity for the application.
  • Sum: All values submitted for the matching metric added together. This statistic can be useful for determining the total volume of a metric.
  • Average: The value of Sum / SampleCount during the specified period. By comparing this statistic with the Minimum and Maximum, users can determine the full scope of a metric and how close the average use is to the Minimum and Maximum. This comparison helps users to know when to increase or decrease the resources as needed.
  • SampleCount: The count (number) of data points used for the statistical calculation.
  • pNN.NN: The value of the specified percentile. Users can specify any percentile, using up to two decimal places. Percentile statistics are not available for metrics that include any negative values. 

Users can add pre-calculated statistics. Instead of data point values, specify values for SampleCountMinimumMaximum, and Sum (CloudWatch calculates the average for you). The values added in this way are aggregated with any other values associated with the matching metric.

Dimensions

dimension is a name/value pair that is part of the identity of a metric. Users can assign up to 10 dimensions to a metric.

Every metric has specific characteristics that describe it, and users can think of dimensions as categories for those characteristics. Dimensions help design a structure for the statistics plan. Because dimensions are part of the unique identifier for a metric, whenever users add a unique name/value pair to one of the metrics, they are creating a new variation of that metric.

  • AWS services that send data to CloudWatch attach dimensions to each metric. Users can use dimensions to filter the results that CloudWatch returns. 

For metrics produced by certain AWS services, such as Amazon EC2, CloudWatch can aggregate data across dimensions. For example, when searching for metrics in the AWS/EC2 namespace but do not specify any dimensions, CloudWatch aggregates all data for the specified metric to create the statistic that is requested. CloudWatch does not aggregate across dimensions for the custom metrics.

Dimension Combinations

Amazon CloudWatch treats each unique combination of dimensions as a separate metric, even if the metrics have the same metric name. You can only retrieve statistics using combinations of dimensions that you specifically published. When you retrieve statistics, specify the same values for the namespace, metric name, and dimension parameters that were used when the metrics were created. You can also specify the start and end times for Amazon CloudWatch to use for aggregation.

Units

Each statistic has a unit of measure. Example units include BytesSecondsCount, and Percent. For the complete list of the units that CloudWatch supports, see the MetricDatum data type in the Amazon CloudWatch API Reference.

  • Users can specify a unit when creating a custom metric. If the unit do not specified, CloudWatch uses None as the unit. Units help provide conceptual meaning to the data. Though CloudWatch attaches no significance to a unit internally, other applications can derive semantic information based on the unit.
  • Metric data points that specify a unit of measure are aggregated separately. When you get statistics without specifying a unit, CloudWatch aggregates all data points of the same unit together. 
Periods

period is the length of time associated with a specific Amazon CloudWatch statistic. Each statistic represents an aggregation of the metrics data collected for a specified period of time. Periods are defined in numbers of seconds, and valid values for period are 1, 5, 10, 30, or any multiple of 60. For example, to specify a period of six minutes, use 360 as the period value. Users can adjust how the data is aggregated by varying the length of the period. A period can be as short as one second or as long as one day (86,400 seconds). The default value is 60 seconds.

  • Only custom metrics that you define with a storage resolution of 1 second support sub-minute periods. Even though the option to set a period below 60 is always available in the console, users should select a period that aligns to how the metric is stored. 
  • When retrieving statistics, users can specify a period, start time, and end time. These parameters determine the overall length of time associated with the statistics. The default values for the start time and end time get you the last hour’s worth of statistics. The values that is specified for the start time and end time determine how many periods CloudWatch returns. 
  • When statistics are aggregated over a period of time, they are stamped with the time corresponding to the beginning of the period. 

Periods are also important for CloudWatch alarms. When creating an alarm to monitor a specific metric, users are asking CloudWatch to compare that metric to the threshold value that was specified. Users have extensive control over how CloudWatch makes that comparison. Not only can users specify the period over which the comparison is made, but they can also specify how many evaluation periods are used to arrive at a conclusion. For example, when specifying three evaluation periods, CloudWatch compares a window of three data points. CloudWatch only notifies if the oldest data point is breaching and the others are breaching or missing. For metrics that are continuously emitted, CloudWatch doesn’t notify users until three failures are found.

Aggregation

Amazon CloudWatch aggregates statistics according to the period length that you specify when retrieving statistics. Users can publish as many data points as needed with the same or similar time stamps. CloudWatch aggregates them according to the specified period length. CloudWatch does not automatically aggregate data across Regions, but users can use metric math to aggregate metrics from different Regions.

  • Users can publish data points for a metric that share not only the same time stamp, but also the same namespace and dimensions. CloudWatch returns aggregated statistics for those data points. Users can also publish multiple data points for the same or different metrics, with any time stamp.
  • For large datasets, users can insert a pre-aggregated dataset called a statistic set. With statistic sets, users give CloudWatch the Min, Max, Sum, and SampleCount for a number of data points. This is commonly used when needed to collect data many times in a minute. For example, suppose users have a metric for the request latency of a webpage. It doesn’t make sense to publish data with every webpage hit. AWS suggest that collecting the latency of all hits to that webpage, aggregate them once a minute, and send that statistic set to CloudWatch.
  • Amazon CloudWatch doesn’t differentiate the source of a metric. When publishing a metric with the same namespace and dimensions from different sources, CloudWatch treats this as a single metric. This can be useful for service metrics in a distributed, scaled system. For example, all the hosts in a web server application could publish identical metrics representing the latency of requests they are processing. CloudWatch treats these as a single metric, allowing users to get the statistics for minimum, maximum, average, and sum of all requests across the application.
Percentiles

percentile indicates the relative standing of a value in a dataset. For example, the 95th percentile means that 95 percent of the data is lower than this value and 5 percent of the data is higher than this value. Percentiles help get a better understanding of the distribution of your metric data.

  • Percentiles are often used to isolate anomalies. In a typical distribution, 95 percent of the data is within two standard deviations from the mean and 99.7 percent of the data is within three standard deviations from the mean.
  • Any data that falls outside three standard deviations is often considered to be an anomaly because it differs so greatly from the average value. For example, suppose that users are monitoring the CPU utilization of the EC2 instances to ensure that the end customers have a good experience. When monitoring the average, this can hide anomalies. when monitoring the maximum, a single anomaly can skew the results. Using percentiles, users can monitor the 95th percentile of CPU utilization to check for instances with an unusually heavy load.
  • Some Amazon CloudWatch metrics support percentiles as a statistic. For these metrics, users can monitor the system and applications using percentiles as using the other Amazon CloudWatch statistics (Average, Minimum, Maximum, and Sum). For example, when creating an alarm, users can use percentiles as the statistical function. users can specify the percentile with up to two decimal places.
  • Percentile statistics are available for custom metrics as long as publishing the raw, un-summarized data points for the custom metric. Percentile statistics are not available for metrics when any of the metric values are negative numbers.

Amazon CloudWatch needs raw data points to calculate percentiles. When publishing data using a statistic set instead, users can only retrieve percentile statistics for this data if one of the following conditions is true:

  • The SampleCount value of the statistic set is 1 and Min, Max, and Sum are all equal.
  • The Min and Max are equal, and Sum is equal to Min multiplied by SampleCount.

The following AWS services include metrics that support percentile statistics.

  • API Gateway
  • Application Load Balancer
  • Amazon EC2
  • Elastic Load Balancing
  • Kinesis
  • Amazon RDS
 
 
Alarms

Users can use an alarm to automatically initiate actions on customers behalf. An alarm watches a single metric over a specified time period, and performs one or more specified actions, based on the value of the metric relative to a threshold over time. The action is a notification sent to an Amazon SNS topic or an Auto Scaling policy. Users can also add alarms to dashboards.

  • Alarms invoke actions for sustained state changes only. CloudWatch alarms do not invoke actions simply because they are in a particular state. The state must have changed and been maintained for a specified number of periods.
  • When creating an alarm, select an alarm monitoring period that is greater than or equal to the metric’s resolution. For example, basic monitoring for Amazon EC2 provides metrics for your instances every 5 minutes.
  • When setting an alarm on a basic monitoring metric, select a period of at least 300 seconds (5 minutes). Detailed monitoring for Amazon EC2 provides metrics for the  instances with a resolution of 1 minute. When setting an alarm on a detailed monitoring metric, select a period of at least 60 seconds (1 minute).
  • When setting an alarm on a high-resolution metric, users can specify a high-resolution alarm with a period of 10 seconds or 30 seconds, or they can set a regular alarm with a period of any multiple of 60 seconds. There is a higher charge for high-resolution alarms. 

CloudWatch Architecture

Amazon CloudWatch

CloudWatch Logs

 
 

Users can use Amazon CloudWatch Logs to monitor, store, and access your log files from Amazon Elastic Compute Cloud (Amazon EC2) instances, AWS CloudTrail, Route 53, and other sources. Amazon CloudWatch Logs enables users to centralize the logs from all of the systems, applications, and AWS services that is in use, in a single, highly scalable service. Users can then easily view them, search them for specific error codes or patterns, filter them based on specific fields, or archive them securely for future analysis. CloudWatch Logs enables users to see all of the logs, regardless of their source, as a single and consistent flow of events ordered by time, and users can query them and sort them based on other dimensions, group them by specific fields, create custom computations with a powerful query language, and visualize log data in dashboards.

  • Query Your Log Data – Users can use CloudWatch Logs Insights to interactively search and analyze the log data. Users can perform queries to help users more efficiently and effectively respond to operational issues. CloudWatch Logs Insights includes a purpose-built query language with a few simple but powerful commands. We provide sample queries, command descriptions, query autocompletion, and log field discovery to help users get started. Sample queries are included for several types of AWS service logs. 

  • Monitor Logs from Amazon EC2 Instances – Users can use Amazon CloudWatch Logs to monitor applications and systems using log data. For example, CloudWatch Logs can track the number of errors that occur in the application logs and send users a notification whenever the rate of errors exceeds a threshold you specify. CloudWatch Logs uses the log data for monitoring; so, no code changes are required. For example, users can monitor application logs for specific literal terms (such as “NullReferenceException”) or count the number of occurrences of a literal term at a particular position in log data (such as “404” status codes in an Apache access log). When the term users are searching for is found, CloudWatch Logs reports the data to a CloudWatch metric that users specify. Log data is encrypted while in transit and while it is at rest. 

  • Monitor AWS CloudTrail Logged Events – Users can create alarms in CloudWatch and receive notifications of particular API activity as captured by CloudTrail and use the notification to perform troubleshooting. 

  • Log Retention – By default, logs are kept indefinitely and never expire. Users can adjust the retention policy for each log group, keeping the indefinite retention, or choosing a retention period between 10 years and one day.

  • Archive Log Data – Users can use Amazon CloudWatch Logs to store the log data in highly durable storage. The CloudWatch Logs agent makes it easy to quickly send both rotated and non-rotated log data off of a host and into the log service. Users can then access the raw log data when you need it.

  • Log Route 53 DNS Queries – Users can use Amazon CloudWatch Logs to log information about the DNS queries that Route 53 receives. 

Users can publish your own metrics to Amazon CloudWatch using the AWS CLI or an API. Users can view statistical graphs of your published metrics with the AWS Management Console. Amazon CloudWatch stores data about a metric as a series of data points. Each data point has an associated time stamp. Users can even publish an aggregated set of data points called a statistic set.

High-Resolution Metrics

Each metric is one of the following:

  • Standard resolution, with data having a one-minute granularity
  • High resolution, with data at a granularity of one second

Metrics produced by AWS services are standard resolution by default. When users publish a custom metric, they can define it as either standard resolution or high resolution. When publishing a high-resolution metric, Amazon CloudWatch stores it with a resolution of 1 second, and users can read and retrieve it with a period of 1 second, 5 seconds, 10 seconds, 30 seconds, or any multiple of 60 seconds. High-resolution metrics can give users more immediate insight into the application’s sub-minute activity. Keep in mind that every PutMetricData call for a custom metric is charged, so calling PutMetricData more often on a high-resolution metric can lead to higher charges

  • When setting an alarm on a high-resolution metric, users can specify a high-resolution alarm with a period of 10 seconds or 30 seconds, or set a regular alarm with a period of any multiple of 60 seconds. There is a higher charge for high-resolution alarms with a period of 10 or 30 seconds.
Using Dimensions

In custom metrics, the --dimensions parameter is common. A dimension further clarifies what the metric is and what data it stores. Users can have up to 10 dimensions in one metric, and each dimension is defined by a name and value pair. How users specify a dimension is different when using different commands.

  • With put-metric-data, users specify each dimension as MyName=MyValue, and with get-metric-statistics or put-metric-alarm customers use the format Name=MyNameValue=MyValue. For example, the following command publishes a Buffers metric with two dimensions named InstanceId and InstanceType.
Publishing Single Data Points

To publish a single data point for a new or existing metric, use the put-metric-data command with one value and time stamp. When calling this command with a new metric name, Amazon CloudWatch creates a metric for users. Otherwise, CloudWatch associates the data with the existing metric that was specified. When creating a metric, it can take up to 2 minutes before users can retrieve statistics for the new metric using the get-metric-statistics command. However, it can take up to 15 minutes before the new metric appears in the list of metrics retrieved using the list-metrics command.

  • Although publishing data points with time stamps as granular as one-thousandth of a second, CloudWatch aggregates the data to a minimum granularity of 1 second. CloudWatch records the average (sum of all items divided by number of items) of the values received for each period, as well as the number of samples, maximum value, and minimum value for the same time period.
  • If the period set to 1 minute, CloudWatch aggregates the three data points because they all have time stamps within a 1-minute period.
  • Users can use the get-metric-statistics command to retrieve statistics based on the data points that is published.
Publishing Statistic Sets

Users can aggregate the data before publishing to Amazon CloudWatch. When users have multiple data points per minute, aggregating data minimizes the number of calls to put-metric-data. For example, instead of calling put-metric-data multiple times for three data points that are within 3 seconds of each other, users can aggregate the data into a statistic set that publish with one call, using the --statistic-values parameter. CloudWatch needs raw data points to calculate percentiles. When publishing data using a statistic set instead, users can’t retrieve percentile statistics for this data unless one of the following conditions is true:

  • The SampleCount of the statistic set is 1
  • The Minimum and the Maximum of the statistic set are equal
Publishing the Value Zero

When data is more sporadic and have periods that have no associated data, users can choose to publish the value zero (0) for that period or no value at all. When using periodic calls to PutMetricData to monitor the health of the application, users might want to publish zero instead of no value. For example, users can set a CloudWatch alarm to notify when application fails to publish metrics every five minutes. Users want such an application to publish zeros for periods with no associated data.

  • Users might also publish zeros when tracking the total number of data points or statistics such as minimum and average to include data points with the value 0.

Custom Metrics

 
 

 

CloudWatch Alarms

 
 

Users can use an alarm to automatically initiate actions on their behalf. An alarm watches a single metric over a specified time period, and performs one or more specified actions, based on the value of the metric relative to a threshold over time. The action is a notification sent to an Amazon SNS topic or an Auto Scaling policy. Users can also add alarms to dashboards. Alarms invoke actions for sustained state changes only. CloudWatch alarms do not invoke actions simply because they are in a particular state. The state must have changed and been maintained for a specified number of periods. The following are Common Features of Amazon CloudWatch Alarms: 

  • Users can create up to 5000 alarms per Region per AWS account. To create or update an alarm, customers use the CloudWatch console, the PutMetricAlarm API action, or the put-metric-alarm command in the AWS CLI.
  • Alarm names must contain only ASCII characters.
  • users can list any or all of the currently configured alarms, and list any alarms in a particular state by using the CloudWatch console, the DescribeAlarms API action, or the describe-alarms command in the AWS CLI.
  • Users can disable and enable alarms by using the CloudWatch console, the DisableAlarmActions and EnableAlarmActions API actions, or the disable-alarm-actions and enable-alarm-actions commands in the AWS CLI.
  • Users can test an alarm by setting it to any state using the SetAlarmState API action or the set-alarm-state command in the AWS CLI. This temporary state change lasts only until the next alarm comparison occurs.
  • Users can create an alarm for a custom metric before  creating that custom metric. For the alarm to be valid, users must include all of the dimensions for the custom metric in addition to the metric namespace and metric name in the alarm definition. To do this, users can use the PutMetricAlarm API action, or the put-metric-alarm command in the AWS CLI.
  • Users can view an alarm’s history using the CloudWatch console, the DescribeAlarmHistory API action, or the describe-alarm-history command in the AWS CLI. CloudWatch preserves alarm history for two weeks. Each state transition is marked with a unique timestamp. In rare cases, users history might show more than one notification for a state change. The timestamp enables users to confirm unique state changes.
  • The number of evaluation periods for an alarm multiplied by the length of each evaluation period can’t exceed one day.
 

Users can add alarms to Amazon CloudWatch dashboards and monitor them visually. When an alarm is on a dashboard, it turns red when it is in the ALARM state, making it easier for you to monitor its status proactively. An alarm invokes actions only when the alarm changes state. The exception is for alarms with Auto Scaling actions. For Auto Scaling actions, the alarm continues to invoke the action once per minute that the alarm remains in the new state. Users can create both metric alarms and composite alarms in CloudWatch.

  • metric alarm watches a single CloudWatch metric or the result of a math expression based on CloudWatch metrics. The alarm performs one or more actions based on the value of the metric or expression relative to a threshold over a number of time periods. The action can be sending a notification to an Amazon SNS topic, performing an Amazon EC2 action or an Auto Scaling action, or creating a Systems Manager OpsItem.
  • composite alarm includes a rule expression that takes into account the alarm states of other alarms that users have created. The composite alarm goes into ALARM state only if all conditions of the rule are met. The alarms specified in a composite alarm’s rule expression can include metric alarms and other composite alarms.

Using composite alarms can reduce alarm noise. Users can create multiple metric alarms, and also create a composite alarm and set up alerts only for the composite alarm. Composite alarms can send Amazon SNS notifications when they change state, and can create Systems Manager OpsItems when they go into ALARM state, but can’t perform EC2 actions or Auto Scaling actions.

 

Amazon CloudWatch Events enables users to respond quickly to application availability issues or resource changes, with notifications from AWS services delivered in near-real-time. Users simply write rules to indicate which events are of interest to the application and what automated action to take when a rule matches an event. Users can also emit events on a schedule. Using simple rules that can quickly set up, users can match events and route them to one or more target functions or streams. Amazon CloudWatch Events becomes aware of operational changes as they occur. CloudWatch Events responds to these operational changes and takes corrective action as necessary, by sending messages to respond to the environment, activating functions, making changes, and capturing state information. Before you begin using CloudWatch Events, users should understand the following concepts:

  • Events – An event indicates a change in users AWS environment. AWS resources can generate events when their state changes. AWS CloudTrail publishes events when you make API calls. Users can generate custom application-level events and publish them to CloudWatch Events. Users can also set up scheduled events that are generated on a periodic basis. 
  • Rules – A rule matches incoming events and routes them to targets for processing. A single rule can route to multiple targets, all of which are processed in parallel. Rules are not processed in a particular order. This enables different parts of an organization to look for and process the events that are of interest to them. A rule can customize the JSON sent to the target, by passing only certain parts or by overwriting it with a constant.
  • Targets – A target processes events. Targets can include Amazon EC2 instances, AWS Lambda functions, Kinesis streams, Amazon ECS tasks, Step Functions state machines, Amazon SNS topics, Amazon SQS queues, and built-in targets. A target receives events in JSON format.

Users can use CloudWatch Events to schedule automated actions that self-trigger at certain times using cron or rate expressions. Users can configure the following AWS services as targets for CloudWatch Events:

  • Amazon EC2 instances
  • AWS Lambda functions
  • Streams in Amazon Kinesis Data Streams
  • Delivery streams in Amazon Kinesis Data Firehose
  • Log groups in Amazon CloudWatch Logs
  • Amazon ECS tasks
  • Systems Manager Run Command
  • Systems Manager Automation
  • AWS Batch jobs
  • Step Functions state machines
  • Pipelines in CodePipeline
  • CodeBuild projects
  • Amazon Inspector assessment templates
  • Amazon SNS topics
  • Amazon SQS queues
  • Built-in targets: EC2 CreateSnapshot API callEC2 RebootInstances API callEC2 StopInstances API call, and EC2 TerminateInstances API call.
  • The default event bus of another AWS account
 
 

Analysis and Visualization

 
 

 

Integration 

Amazon CloudWatch

Users can use CloudWatch Logs to monitor and troubleshoot the systems and applications using the existing system, application, and custom log files. Users can send the existing system, application, and custom log files to CloudWatch Logs and monitor these logs in near real-time. This can help users better understand and operate their systems and applications, and store the logs using highly durable, low-cost storage for later access. 

AWS CloudFormation enables users to model and set up the AWS resources. Users create a template that describes the AWS resources needed, and AWS CloudFormation takes care of provisioning and configuring those resources for users. Users can use CloudWatch Events rules in the AWS CloudFormation templates

AWS Config enables users to record configuration changes to the AWS resources. This includes how resources relate to one another and how they were configured in the past, so that users can see how the configurations and relationships change over time. Users can also create AWS Config rules to check whether the resources are compliant or noncompliant with the organization’s policies.
 
AWS Lambda enables users to build applications that respond quickly to new information. Upload your application code as Lambda functions and Lambda runs code on high-availability compute infrastructure. Lambda performs all the administration of the compute resources, including server and operating system maintenance, capacity provisioning, automatic scaling, code and security patch deployment, and code monitoring and logging.
 
Amazon Kinesis Data Streams enables rapid and nearly continuous data intake and aggregation. The type of data used includes IT infrastructure log data, application logs, social media, market data feeds, and web clickstream data. Because the response time for the data intake and processing is in real time, processing is typically lightweight.
 
 
AWS Identity and Access Management (IAM) helps you securely control access to AWS resources for end users. Use IAM to control who can use the AWS resources (authentication), what resources they can use, and how they can use them (authorization).

Amazon CloudWatch is a monitoring and management service built for developers, system operators, site reliability engineers (SRE), and IT managers. Amazon CloudWatch provides  data and actionable insights to monitor users applications, understand and respond to system-wide performance changes, optimize resource utilization, and get a unified view of operational health. Amazon CloudWatch collects monitoring and operational data in the form of logs, metrics, and events, providing a unified view of AWS resources, applications and services that run on AWS, and on-premises servers. Users can use CloudWatch to set high resolution alarms, visualize logs and metrics side by side, take automated actions, troubleshoot issues, and discover insights to optimize the applications, and ensure they are running smoothly.

  • The Amazon CloudWatch home page automatically displays metrics about every AWS service. Users can additionally create custom dashboards to display metrics about users custom applications, and display custom collections of metrics that users choose.
  • Users can create alarms that watch metrics and send notifications or automatically make changes to the resources users are monitoring when a threshold is breached. Users can also use this data to stop under-used instances to save money.
  • With Amazon CloudWatch, users gain system-wide visibility into resource utilization, application performance, and operational health.