Understand Lambda Logging and Unlock CloudWatch Logs
I think it’s fair to say that everyone who uses Lambda is aware of CloudWatch. It’s well known that a call to your language’s version of console.log
, print
, or Console.WriteLine
will result in text appearing in CloudWatch logs. This is usually where a lot of people’s understanding ends, though.
Don’t get me wrong, this is usually enough to successfully debug a Lambda function. However, I think a deeper understanding can help reason about your function’s logs more easily and unlock new possibilities.
In this post, I’ll explain how Groups, Streams, and Events fit together, describe some relevant parts of the CloudWatch API and how Lambda interacts with it, then explain how to make use of your logs after they’re sent to CloudWatch.
Log Groups, Log Streams, and Log Events
Let’s take a step back and look at how CloudWatch organises logs. The lowest units in CloudWatch are events. Each event represents a single log record and is made up of a timestamp
and a message
.
Events are uploaded to CloudWatch in batches of up to 10,000 chronological ordered events using PutLogEvents
.
In the screenshot below, each row displays an event. The message
can contain multiple lines of text which you can reveal using the arrow on the left.
Log events always belong to a single log stream. A log stream is simply a named sequence of log events. That name can be anything, but Lambda has its own naming convention that we’ll take a look at. Log streams are created using the CreateLogStream
API action.
Log streams have something called a sequence token which is an opaque string (meaning you shouldn’t care what the characters mean). A new sequence token is returned each time you call PutLogEvents
. Each PutLogEvents
call must include the sequence token returned by the previous call, except the very first call on a newly created stream.
The sequence token helps keep log entries in order. It also makes it practically impossible for a log stream to have more than one writer at a time. That’s because even though DescribeLogStreams
returns the current sequence token, it may be invalid by the time you call PutLogEvents
due to someone else making the call before you.
Lastly, a log group is a named collection of log streams. Lambda also has its own naming convention for groups. It’s here at the log group level that you set the number of days to retain log events and can configure encryption.
Log groups are created using the CreateLogGroup
action. They can have subscription filters and metric filters associated with them, but we’ll talk about those near the end of this post.
Lambda Logs
Now that we know more about groups, streams, and events, let’s take a look at how Lambda uses them.
If you want your function to log to CloudWatch, its IAM policy must allow the logs:CreateLogGroup
, logs:CreateLogStream
, and logs:PutLogEvents
actions. You’ve probably seen a policy like this before:
Lambda creates a new log stream per container. That makes sense when you think about the way sequence tokens work. With this one-to-one mapping, each container simply keeps track of its own sequence token.
Lambda’s log stream naming convention is date/[version]uuid
.
Therefore, 2019/07/12/[$LATEST]312c2d81e2e64af58dbe557754f9aa13
will contain logs from a container that started on 2019/07/12
and run the latest version of the function. If you invoked version 1
instead, the name would have been 2019/07/12/[1]312c2d81e2e64af58dbe557754f9aa13
.
Log streams must belong to a log group, so Lambda creates a new log group per function just before creating a log stream, if it doesn’t already exist.
The group naming convention is /aws/lambda/
followed by your function’s name. So Lambda would create /aws/lambda/zac
for the zac
function.
Log events come from your function’s standard output and standard error streams. These two streams are automatically redirected to another process that is responsible for asynchronously pushing batches of log events to CloudWatch.
Unlocking CloudWatch Logs
Once your functions output is pushed to CloudWatch, there is actually quite a lot more you can do.
Searching
With parallel containers logging to multiple streams, it can be hard to find the logs you’re looking for, such as those of a particular customer or order. That’s especially true if you’ve never noticed the Search Log Group button (I don’t blame you if you haven’t).
This button is above the list of a log group’s streams. It takes you to a page that shows all of the events in all of the streams in the group. From there, you can use the Filter events text box to search the entire log group and click Show in stream to see surrounding events.
Insights
CloudWatch Logs Insights enables you to search and analyse your log data using a query language that can aggregate, filter, and project log events across the entire log group. You can get the results as a table or graph.
One of the coolest things about Insights is that it understands TSV and JSON log events, letting you do even more advanced queries. I find Insights really useful when I don’t want to publish a custom metric or forgot to.
Metric Filters
Metric filters define terms and patterns to look for in log data as it is sent to CloudWatch. CloudWatch uses metric filters to turn log data into numerical metrics that you can graph or set alarms on.
One of the most common uses of metric filters with Lambda is extracting a metric for consumed memory from the REPORT
log entries.
Subscription Filters
You can use subscription filters to get a real-time feed of log events delivered to other services such as a Kinesis Data Stream, Kinesis Firehose, or a Lambda function for custom processing, analysis, or loading to other systems.
Each log group can only have one subscription, but by using a Kinesis Data Stream you can have multiple consumers (especially with enhanced fan-out).
It’s common to use a subscription to copy logs from CloudWatch into a more powerful system such as the Elastic Stack.
I may write more about using subscriptions in a future post, so let me know if that sounds interesting.
Batch Export
In some cases, real-time streaming isn’t necessary and a big dump of log data is just what the doctor ordered. In that case, CloudWatch can export to S3.
This could be used for things such as archiving, batch analysis, or importing to other systems. Don’t forget to save costs by setting an S3 lifecycle policy.
Log data can take up to 12 hours to become available for export. If you need files in S3 sooner, perhaps consider Kinesis Firehose to S3 (see above).
Conclusion
CloudWatch is a large and often neglected service. It’s given just enough attention to survive then pushed aside for the more shiny services.
However, I think there’s real value is understanding it a bit more, so I hope you learned something useful from this post, even if it’s just the location of the log group search button.
Please let me know if you have any questions or would like to hear more about a particular topic.