.NET Core 3.0 has been out of preview for about a month now. In that time, I’ve had several people ask me to follow up on my previous benchmarks and provide new recommendations. It’s taken some time, but the results are in!
Just like .NET Core 2.2, AWS Lambda won’t have native support for 3.0 because it’s not an LTS version. Instead, you must use AWS Lambda’s Custom Runtime which is quite easy.
The other big differences with 3.0 are the compile-time options that have been added and changed. The four most interesting options in 3.0 are:
R2R is a form of ahead-of-time compilation. It’s off by default, but when enabled, your application assemblies are compiled as ReadyToRun binaries. Microsoft claim this improves startup performance by reducing the amount of work the Just-In-Time (JIT) compiler must do as your application is loading.
To use 2.2 or 3.0 on Lambda, your deployment package must include the whole .NET runtime which is already pre-installed when running a .NET Core 2.1 Lambda function. As a result, the deployment packages are much bigger. For the function these tests were performed with, the 2.1 package was 890 KB while the 2.2 and 3.0 packages were just over 30 MB each!
It’s off by default, but when enabled, IL Trimming analyzes your application and trims away any unused assemblies. In the test case, it halved the package size down to 16.3 MB!
This was actually added in 2.1, but didn’t really get much attention until the preview stages of 3.0 when it was changed to be enabled by default. I talked about it quite a bit during my previous benchmarks. The idea is that with this enabled, the runtime can get better performance at startup and maximize throughput over time. It theoretically achieves this by compiling your code with fewer optimizations until it detects parts that are being called a lot.
This is closely related to tiered compilation. Quick JIT makes the compiler apply fewer optimizations to speed up compilation. I’ve listed it separately as it can be turned on and off separately. I believe it’s off by default as it has been since preview 4.
Four options that can be set to true or false means sixteen different combinations. Which combination should you use to get the best performance from your Lambda functions? Well, I tested them all.
Test Function Code
In my experience, .NET Lambda performance, especially during cold starts, is effected most by the amount of code that needs to be compiled at runtime. Therefore, I performed these tests with a function that uses Autofac for dependency injection and instantiates clients for DynamoDB and S3. It doesn’t actually make any network calls, but due to the way the JIT compiler works, this gives it enough to chew on and produces more meaningful results.
The test function just takes in a simple JSON object (which it needs to deserialize using JSON.NET) and returns the invocation’s AWS X-Ray trace ID so I can retrieve the timings more easily.
I tested the exact same code compiled against .NET Core 2.1.802, 2.2.402, and sixteen configurations of 3.0.100. I added AWS’ Runtime Support library on top for the non-2.1 deployments, but the rest of the code remained the same.
In the results, I’ve used a sort of code to differentiate the configurations. I’ve given each option a letter. If the option was enabled, the letter is present.
t— IL Trimming
c— Tiered Compilation
q— Quick JIT
30-tcq is .NET Core 3.0. It has IL Trimming, Tiered Compilation, and Quick JIT enabled. Another example is
30-r which only has ReadyToRun enabled and everything else switched off. Lastly,
30 has everything off.
2.1 and 2.2 just used the default options. Nothing fancy.
Graphs For Ants
As you‘ll see, representing 18 different series on a single graph can make graphs harder to read. I’ve done my best with colour coding and I’ll call out the important bits, so don’t worry.
However, under each graph there is a link to an interactive version of the graph. If you go to that, you’ll be able to hover over the series an get more information.
In the line graph, if you hover over the dots you’ll see the series name. If you hover over a line, it will highlight the whole line so its easier to see.
Now, let’s take a look at the results!
Of course we’re going to start with cold starts. They’re everyone’s favourite topic. For .NET Core 3.0, it’s actually very important to look at both cold and warm performance. I’ll show you why shortly.
In the graph below, we see there is actually quite a bit of difference between the 3.0 configurations and that many of them are much better than 2.2. That’s awesome!
The first recommendation is don’t bother using 2.2, just go straight to 3.0.
It’s clear that 2.1 is still quite a bit faster than 3.0. Even with 1 GB of memory, there is almost 200 ms difference on average. At 2 GB and above, that difference becomes more and more negligible.
The next recommendation is that you should keep using 2.1 if you want maximum performance. If you use 3.0, add more memory for more performance!
Lastly, let’s look at the three brighter, highlighted configurations.
30-trcq is .NET Core 3.0 with everything turned on. It’s the fastest configuration. It’s red because it’s a red herring (bad joke). I’ll go into that, as well as the blue (
30–r) and purple (
30-tr) options in a minute. The suspense is building, I know.
Initliazation + Overhead
Before we move on to the warm invocations, let’s drill down into the cold starts a bit.
Remember, initialization is the time the Lambda service spends preparing your function for a cold invocation. When using a custom runtime, overhead happens on every invocation after the custom runtime sends the function result to the Lambda Runtime API.
Here we see similar results to the totals above with just a few configurations having moved around. Nothing to write home about, but worth seeing.
I measure warm starts with 128 MB of memory. I skip the first two invocations because one is cold and the second is sporadic for some reason.
In this first graph, I want to show you that tiered compilation (
c) is causing inferior performance for up to 20 seconds. This is similar to what we saw in preview 3.
Remember I said we’d come back to the highlighted configurations? The red one (
30-trcq) was the fastest in the cold start results. However, it has tiered compilation enabled, so it has about 15 seconds of worse warm performance.
The reason for this is that tiered compilation assumes you may hit some code repeatedly during startup then never again. It tries to avoid optimizing the wrong code by using a timer. Clearly the timer is somewhere around 15–20 seconds, after which it starts optimizing our frequently called code.
Make sure you have tiered compilation turned off to avoid 15–20 seconds of lower performance after the cold start.
If you shouldn’t use
trcq then what should you use? Well, let’s go back to the purple (
30-tr) and blue (
30-r) configurations. Both of these have tiered compilation disabled and both performed very well in the cold start test.
The reason I have two here is that even though purple (
30-tr) performs slightly better than blue (
30-r) in cold starts, I still recommend that latter.
The difference between the two is IL Trimming (
t). Microsoft’s documentation says the following about IL Trimming:
It’s important to consider that applications or frameworks (including ASP.NET Core and WPF) that use reflection or related dynamic features, will often break when trimmed. This breakage occurs because the linker doesn’t know about this dynamic behavior and can’t determine which framework types are required for reflection. The IL Linker tool can be configured to be aware of this scenario.
Personally, I’d rather avoid the fiddly extra configuration and just disable IL Trimming given the performance is so similar.
My recommendation is that unless you need IL Trimming to fit inside the Lambda package size limit, have ReadyToRun enabled and everything else disabled.
The .NET Core 3.0 configuration with the best cold start performance is the one with all four options turned on. However, with Tiered Compilation enabled, functions perform badly for 15–20 seconds after the cold start until your code is optimized.
Quick JIT seemingly has no impact with Tiered Compilation disabled and IL Trimming didn’t have much of an impact on the test results.
ReadyToRun is the big hitter and should be enabled. Windows users will find this makes building a pain since cross-compilation doesn’t seem to work. This means you’ll need to compile and package your function on Linux. I can write a guide on how to do this if you’re interested. Please let me know.
- Don’t bother using 2.2, just go straight to 3.0.
- Keep using 2.1 if you want maximum performance.
- If you use 3.0, add more memory for more performance!
- Make sure you have tiered compilation turned off to avoid 15–20 seconds of lower performance after the cold start.
- Don’t use IL Trimming unless you need it to fit inside the Lambda package size limit.
- When using 3.0, have ReadyToRun enabled and everything else disabled.