Faster and Lazier Container Startups

Mar. 22, 2025 • Last updated on Mar. 23, 2025

I was reading about the recently introduced “NVIDIA Inference Microservices (NIMs)” and how they can be deployed on Azure Container Apps using “serverless GPUs". In a tutorial in the official docs , there’s a dedicated section on the importance of enabling what Microsoft calls “artifact streaming", which sparked my curiosity about how it works.

In very simplified terms, it’s a strategy where only the essential container image layers are pulled first, allowing workloads to initialize faster. The remaining layers are downloaded subsequently (at least in AKS) or only when needed (GKE implementation).

The earliest mention I found of this idea was in a 2016 study that brought up an interesting statistic:

“Image download accounts for 76% of container startup time, but on average, only 6.4% of the fetched data is actually needed for the container to start doing useful work."

In 2021, Google implemented image streaming on GKE and, a year later, Amazon open-sourced a solution to provide this capability to containerd. Microsoft is still catching up with this feature in preview since late 2023 in ACR.

If you are interested in learning more, here are some links: