Amazon CloudWatch is a fully managed observability service that provides data and actionable insights for AWS resources, applications, and services. It enables you to:
CloudWatch plays a central role in diagnosing operational issues and optimizing performance in a distributed, event-driven, or serverless architecture.
Establishing proper logging and observability with CloudWatch is essential for the following reasons:
Troubleshooting and Debugging:
Logs help you trace issues down to the exact function or event, allowing you to resolve errors faster.
Operational Monitoring:
Metrics allow you to track system health — including function invocation rates, latency, and error frequency.
Failure Detection:
Alarms notify you proactively of problems like message backlog in queues, throttling, or DLQ overflow.
Audit and Compliance:
Logs serve as a historical record of system behavior for audit trails and post-incident reviews.
Performance Optimization:
CloudWatch metrics can be used to analyze bottlenecks, optimize compute/memory usage, and reduce cost.
Implement centralized monitoring and observability for your serverless application using Amazon CloudWatch. This includes:
The following AWS components should be integrated with CloudWatch:
Component | Metrics and Logs to Monitor |
---|---|
AWS Lambda | Invocations, duration, errors, throttles |
Amazon SQS | Messages sent, visible messages, DLQ queue length |
Amazon SNS | Message publish success/failure |
Amazon API Gateway | Request count, 4xx/5xx errors, latency |
Amazon S3 | Optional: access logs, request metrics |
Lambda automatically integrates with CloudWatch Logs if the execution role includes the appropriate permissions.
Ensure your Lambda function’s IAM role includes the following permissions:
{
"Effect": "Allow",
"Action": [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents"
],
"Resource": "*"
}
Verification:
Navigate to Amazon CloudWatch via the AWS Management Console.
In the left-hand menu, select (Logs) → Log groups, then click Create log group.
Enter a name for the log group, set the retention policy to 1 day, and choose the Log class as Standard.
For HTTP APIs:
$context.requestId $context.httpMethod $context.resourcePath $context.status
CloudWatch Logs Insights provides a query interface to search logs. Example query:
fields @timestamp, @message
| filter @message like /error/
| sort @timestamp desc
Set up alarms in (CloudWatch → Alarms) to detect operational issues such as:
Metric | Description |
---|---|
Lambda (Errors) | Triggers when function errors > 0 |
DLQ message count | Detects if messages accumulate in DLQ |
API Gateway (5xx errors) | Identifies upstream/internal service issues |
SQS message backlog | Use (ApproximateNumberOfMessagesVisible) |
Alarms should be tied to actionable alerts (e.g., via SNS topic or email).
Create a custom dashboard in (CloudWatch → Dashboards) with the following widgets:
Widget Title | Metrics Displayed |
---|---|
Lambda Performance | Invocations, Errors, Duration |
API Gateway Error Rates | 4xx and 5xx error counts and latency |
SQS Queue Depth | Message count in queues |
DLQ Monitoring | DLQ queue size over time |
Use these visualizations for a real-time overview of system health and performance.
After implementing the above: