Making .NET AWS Lambda Functions Start 10x Faster using LambdaNative

Cold starts are one of the most often misunderstood aspects of AWS Lambda, yet they can have the largest impact on a function’s performance. Many developers have tried to solve the problem by invoking their function on a schedule to keep it warm. This doesn’t actually work and misses the point of functions as a service. You can increase the memory limit to speed things up, but that can cost more and only makes things slightly better.

Today, I present LambdaNative to you, a solution that gives you the best of both worlds (cost and performance).

What are cold starts and why are they slow?

AWS Lambda functions run inside containers (Firecracker microVMs). Each container can run a single invocation of your function at a time. At any given time, Lambda has a pool of zero or more containers for each function. When a function is called, either an existing container is reused or a new one is created. The colloquial term “cold start” refers to a new container starting.

The fact you need a container for each parallel request, and that they can be terminated at any time, means you can’t keep a function warm. Not that you should want to.

Some of the reasons cold starts are slow, such as copying your code from S3 or VPC networking, apply to all of the supported runtimes, not just .NET Core. The biggest performance hit for .NET functions, however, comes from converting the Common Intermediate Language (CIL) code into machine code via just-in-time (JIT) compilation.

The more code and libraries you use in your function, the slower it will be. Less abstraction and more action leads to faster cold starts.

How slow is slow?

I’m glad you asked because I’ve got the numbers! To get this data, I created a test Lambda function that…

This is a realistic use-case which involves libraries such as AWSSDK.DynamoDBv2, AWSSDK.SimpleNotificationService, Newtonsoft.Json, and Amazon.Lambda.APIGatewayEvents, all of which will need to be compiled to machine code.

I then created an AWS Step Function to continuously modify and directly invoke the above function. The modification is changing the timeout, which currently causes Lambda to perform a cold start.

The ModifyLambda state executes a different function that performs the modification and returns an APIGatewayProxyRequest. The output of each state is used as the input to the next, so InvokeLambda invokes the test function with the request it expects.

Finally, I used AWS X-Ray to record accurate timings for each invocation.

A standard cold start consists of initialization and invocation.
Initialization, which only happens on cold starts, is everything up until your class method is executed (including instantiating your class if it’s not static).
Invocation includes deserializing input, running your code, and serializing output.

The graphs in this post all show 25 sequential invocations of the test function along the x-axis and timings on the y-axis. The lines represent different memory limit configurations available at the time of testing.

Remember, the amount of CPU your function can use increases with memory.

Minor gridlines represent 50 ms

As you can see, there’s not much difference between the memory limits. Initialization is between 200 ms and 350 ms for standard .NET functions.

Minor gridlines represent 500 ms

In the invocation phase, however, we see a huge difference between memory limits. This difference is caused by JIT’ing and running the actual code.
It’s clear that 128 MB is the slowest with an average of 10,671 ms, while the fastest is 3008 MB with an average of 802 ms. The fact that there is very little improvement after 1024 MB is interesting.

What does fast look like?

The following graphs so the same function running under LambdaNative.

There’s not a lot of performance to be gained during the initialization phase, but there is still some:

Minor gridlines represent 50 ms

Under LambdaNative, initialization is between 150 ms and 250 ms, which is a 25% improvement. That’s nice, but we’re just getting started. The real savings are made during the invocation step.

Minor gridlines represent 500 ms

Under LambdaNative,128 MB is now averaging 1656 ms (a 10x improvement) and the rest are all faster than 3008 MB was previously.
3008 MB itself is down to 91 ms on average (an 8.8x improvement).

The graph below shows standard (dotted lines) and LambdaNative (solid lines) overlayed for comparison.

Minor gridlines represent 500 ms

LambdaNative also has faster warm starts. The graph below shows the first 24 warm invocations after a cold start.

Notice how the first warm start of a standard function is slower than the rest? LamdaNative also mitigates that strange behaviour.

Minor gridlines represent 25 ms

What is LambdaNative?

At the end of 2018, AWS announced Custom Runtimes and the Runtime API that enables them. In a nutshell, you select Custom Runtime and provide an executable named file bootstrap (in your .zip file) which AWS Lambda will execute instead of its own. You’re then responsible for interacting with an HTTP API to get executions, running handler code, and reporting the result back to the API.

LambdaNative is a library that handles the API interaction and error handling for you. All you need to do is tell it which handler to execute by implementing an interface and calling LambdaNative.Run.

You can then use CoreRT to perform ahead of time compilation, producing a native executable that doesn’t require any runtime compilation.

I’m very happy to announce that v1.0.0 is now publically available!
The README in the example directory on GitHub has very detailed instructions on how to get started with LambdaNative.

This is obviously a bit more work, but the results speak for themselves. Having said that, integrating it all into your build system would hide most of the added complexity.

That’s it! If you try it out or have any feedback, please let me know!

For more like this, please follow me on Medium and Twitter.

Principal Engineer @ Just Eat | AWS Community Builder