This is the first post in a two part series. The second part presents an alternate solution which is generally better, but has some trade-offs. I recommend reading both.
Last week, Chase Douglas sent out a serverless bat signal asking if anyone knows of a way to trigger a Lambda function at a frequency of 5–10 seconds.
Of course, the most common way to schedule a Lambda function to be invoked on a recurring schedule is using an EventBridge rule. However, rules have a minimum precision of one minute.
Alex DeBrie suggested setting up a single Lambda function that is invoked every minute. The function would send a message every 5–10 seconds to an SNS topic. Another function would be subscribed to the SNS topic and do actual work.
This would probably work, but you’d effectively have a Lambda function running continuously. It would be sleeping most of the time and you’d be paying for that.
My solution was based on Alex’s, but modified to use SQS message timers. The Lambda function that is invoked every minute quickly publishes messages to an SQS queue with various delays and stops. The messages then become visible on the queue every 5–10 seconds and trigger another function.
I’ve set this up and tested it.
I used Serverless Framework to put everything together. From line
20 onward, the
serverless.yml below shows the
interval functions being created, along with the SQS queue.
Note that I set
10so that the EventBridge rule is created using CloudFormation instead of a Custom Resource. This requires Serverless Framework
As the name suggests, this function is triggered every minute by an EventBridge rule. The
everyminute handler code below creates a series of delayed messages and sends them to the SQS queue in batches of up to
This could have been simpler, but I wanted to be able to configure the rate using the
RATE_IN_SECONDS environment variable.
This function is configured to consume messages from the SQS queue.
I’ve given it a batch size of
10 messages so that it can quickly catch up in case it gets behind. I’ve also configured a reserved concurrency of
1 as we’re only publishing enough messages to support one consumer (see Enhancements for a fan-out idea).
interval handler code simply logs its input and the current time.
I want to explicitly call out that I set the message retention period to
70 seconds (the default retention period is
4 days). I did this because I don’t want to build up a large backlog of visible messages if something breaks.
I originally set the retention period to
60 seconds thinking each message would be deleted just before the next one is visible. For example, the message published for
08:13:50 would be deleted just as the one for
08:14:50 becomes visible. However, it turns out the retention period starts as soon as the message is sent, not when it becomes visible. So I padded it a little.
How accurate is it?
There are two aspects of accuracy to look at. The first is how accurate (or consistent) EventBridge is when invoking the
EventBridge only guarantees a rule will run within the specified minute, not at the 0th second. From my testing, this appears to mean that rather than moving around within the minute, EventBridge picks a second and aims for that every minute.
I don’t think the documentation guarantees this, but an hour of testing showed EventBridge invoked our function on the same second of every minute, give or take ~500 milliseconds. Not bad at all.
Lambda & SQS
Next we’ll look at the delayed messages being processed by the
interval function. SQS messages reliably become visible after the delay, so the main concern is the one raised by Chase Douglas on Twitter about Lambda’s polling behaviour. Is it frequent enough?
Lambda’s documentaion says it uses long polling with a minimum of 5 processes (this behaviour is what leads to the behaviour I wrote about in one of my most popular posts). This is good news as it means Lambda will be waiting with a connection open to SQS and will invoke the
interval function as soon as a message is visible in the queue.
The following graph shows how far away from “every 10 seconds” the invocations were over an hour. I’d call this within a second, usually 250ms.
To take this idea further, it might be worth changing the
everyminute function to intelligently adjust the delays in case it’s invoked earlier or later. EventBridge was consistent during testing, but maybe it’s less consistent over a longer period.
I’m not sure if I would do any actual work in the
interval function. It might be better to publish an SNS message and subscribe to that. That way, the business logic is separated from the scheduling magic and you could fan-out to multiple functions.
If you needed to guarantee you only invoked the function once per X seconds, you could introduce a DynamoDB table and Stream to debounce the invocations, but that’s getting a bit more complex.
Honestly, I think this a success. I’m impressed by the accuracy. If you need more accuracy than this, you may be better off with a different solution.
All of the code shown here is available on GitHub.