The Kubernetes Deployment Lifecycle
How many of us actually know how the Kubernetes pod lifecycle, or the Kubernetes container lifecycle, actually works? The Kubernetes workload is often described as ‘ephemeral, ’short-lived’ and there is no shortage of repeats for the ‘cattle versus pets’ description of servers in cloud native app development. In this blog, we will discuss the lifecycle of the Kubernetes workload and why it matters.
See why real-time KSPM is a requirement
The top 14 KSPM misconfigurations across hardening and incident response make real-time Kubernetes Security Posture Management (KSPM) a non-negotiable for any Kubernetes security program.
Pod or container lifecycle?
Pods can be confusing little things. They are the atomic unit of Kubernetes, and like atoms in physics there are certain sub-atomic “particles” within them. In the case of pods, these are containers and init-containers. The room for confusion is in conflating containers and pods into an equivalent unit. Indeed, most guides (ie, hello, world demos) use single-container pods with no init-containers, making it easy to think of a pod as just another word for a container. However, because pods operate at a higher level of abstraction, mapping container constructs onto a pod does not always work. This article will look at one such mis-match: the stages of the Pod lifecycle.
Background: Kubernetes Container Lifecycle
Let’s start by making sure we are all on the same page regarding the stages of a container’s lifecycle. Kubernetes defines three basic phases for a container: Waiting, Running, and Terminated. The status of the containers in a pod can be seen by using the kubectl describe pod pod-name command. Waiting is a slightly complex phase, as it encompasses everything that happens before a container is “running,” including waiting to be scheduled, pulling an image, and the startup process (things like applying a Kubernetes secret object). Kubernetes uses a separate Reason field to track why a container is in the waiting phase. Running and Terminated are more or less what they say on the tin, the only nuance is that Terminated encompasses both containers that have exited normally and those that have existed due to some sort of error. As with the Waiting phase, kubectl reveals a Reason field and the exit code for terminated containers to shed more light on this.
Mapping Container Phases to the Kubernetes Pod Lifecycle: A Simple Example
In the most basic case— a single container pod— the container phases map pretty easily to pod lifecycle phases. The pod lifecycle begins with a Pending phase— in a single-container pod this essentially encompasses the Waiting phase of the container lifecycle. Both pods and containers have a Running phase, and in a single-container pod these are essentially equivalent (we’ll get to some nuance here in a moment). Finally, Kubernetes provides two pod lifecycle phases that encompass the Terminated container status: Succeeded, which in a single-container pod indicates the container has exited normally/successfully, and Failed, which would indicate an exit in an error state.
Viewed from the perspective of a single-container pod, this is an almost 1:1 mapping, and one might be tempted to conflate the container and pod lifecycle phases. Remember, though, that pods are a different level of abstraction from containers. This might seem like a distinction without a difference, but it turns out to be quite important. Let’s look at why.
Understanding the Pod Lifecycle Stages at the Right Level of Abstraction
At first glance, perhaps the most important distinction between a Pod and a container is that Pods can contain more than one container, including init-containers. The biggest consequence of this for our purposes is around defining the Running phase of the Pod lifecycle. It turns out that the Running phase for a pod does not indicate that the container is up and running, but rather that at least one of the containers has made it at least as far as the Running phase of the container lifecycle. There are some nuances here to unpack. At least one container means that the containers in the pod can be at multiple stages in the container lifecycle. In fact, they could be at any phase in the container lifecycle. As an example, you could have containers that are Running while others are still Waiting (perhaps still pulling their image). This case makes a fair amount of sense— we intuitively expect a larger image to take longer to pull than a smaller one, so the fact that the smaller container might make it to the Running phase first is not surprising. Perhaps more surprising, however: at least as far as the Running phase means that we can also have containers that have Terminated in a Pod that is at the Running lifecycle phase so long as those containers are in the process of being restarted.
At first glance, this seems counter-intuitive, but it stems from the level of abstraction that a pod represents. A Pod is not a container (or even a collection of containers), it is a Kubernetes API object. It represents an instance of an application or service composed of one or more containers. As a Kubernetes API object, it also represents the “desired state” of the cluster, not the “actual state” of the containers declared in the Pod manifest. In other words, the Running phase of a Pod’s lifecycle is only indirectly connected to the status of the containers in the Pod. What the pod’s lifecycle represents is whether the API object “pod”, representing a desired instance of an application, has been successfully instantiated in the cluster. Instantiated here means something like: scheduled, images pulled, and manifest in the hands of the kubelet so that the desired state of deployment can be maintained. In other words, it is answering the question “is this object now being managed by Kubernetes?” Kubernetes does not guarantee that an object it is managing is actually running or available at any given moment (in fact, it explicitly treats many objects as ephemeral), only that the cluster is continuously striving to make it running/available according to the specifications in the object’s manifest.
Seen at this level of abstraction, we need new definitions for the pod lifecycle stages:
- Pending: the object is not yet ready for management. This could be because (a) it hasn’t been scheduled yet, (b) the images have not yet been pulled (possibly indicating an image pull error), or (c) the pod is waiting on init-containers to complete their tasks.
- Running: the object is now under management. This indicates that it is assigned to a host, images have been pulled, init-containers have completed, and at least one of the containers in the pod has successfully achieved at least the running stage of the container lifecycle. Kubernetes can now use the object manifest to maintain desired state for the object.
- Succeeded: the object is no longer under management (it’s run is finished) and it exited successfully.
- Failed: the object is no longer under management because it terminated with an error.
These pod lifecycle stages are taken from the point of view of the Kubernetes API, which is concerned about “desired state.” That’s not necessarily the most helpful view from the perspective of a user, though, who mostly wants to know the “actual state” of the objects in a cluster (ie, can I connect to this object right now?). In addition to seeing the status of actual containers in the pod using kubectl, Kubernetes also provides “Pod Conditions” that provide additional information about the Pod’s readiness for use (it’s “actual state”). These are:
- PodScheduled - A pod in the Pending phase of its lifecycle that has been scheduled to go on a particular node in the cluster. No containers are yet available for work, but the pod is probably starting to pull images and process things like Kubernetes secrets.
- Initialized - A pod transitioning between the Pending and Running phases of its lifecycle whose init containers have completed successfully. Again, the main workload containers are not yet available or running, but are now probably being started up.
- ContainersReady - A pod in the Running phase with all of it’s containers in the Running phase. This is one step below Ready below, and indicates that there are probably custom readiness checks configured for the Pod that may not yet have succeeded.
- Ready - A pod in the Running phase that is now fully ready to serve requests via Services. All its containers are Running and any custom readiness checks have been achieved. In other words, this is where “desired state” and “actual state” meet and hold hands.
How long does Kubernetes keep pod logs?
If you want to understand the lifecycle of your Kubernetes clusters in detail, you will want to keep the Kubernetes audit logs. Kubernetes keeps audit logs for terminated containers and performs methods to ensure this doesn’t consume all available storage on the node. However, this is not a fail-safe method. For example, if a container fails many times or the pod is otherwise not on the same node, the containers’ and their corresponding logs are lost.
You will want to have some observability solution or other way to keep your pod logs so you can understand what is happening in your clusters over time, above what Kubernetes offers by default. This is called cluster level logging, and the storage and lifecycle of this storage will be separate from that of your actual environment.
Kubernetes pod lifecycle hooks
Kubernetes allows you to trigger events at certain points in the lifecycle. You can do this right after a container is created or right before a container is terminated. There are difference between init containers and lifecycle hooks here, in accordance with the the nuances described above, so it’s important to keep those in mind.
TLDR; Kubernetes deployment lifecycle
In understanding the Kubernetes workload lifecycle, pods are often thought of as another name for containers. But understood correctly pods are Kubernetes API objects representing “desired state” for the most atomic unit of work on a cluster. As a result, the phases or stages of the Pod lifecycle indicate the status of that API object (is it being “managed” by Kubernetes to achieve desired state?), not the “actual state” of the workload or its readiness to perform work. This is a subtle but important distinction that points to the level of abstraction for Kubernetes objects and its promise not that a workload is available at any given moment but that the cluster is striving to make it available.
The Kubernetes lifecycle matters for security
KSOC is the first and only real-time security solution for Kubernetes that takes into account the Kubernetes lifecycle. Given how complicated it is and how quickly workloads cycle through different stages on any given day, it is impossible to secure Kubernetes without also taking into account the lifecycle. A point in time Kubernetes scanner creates inactionable results by definition because the workloads will have changed since the scan was performed. And it is impossible to figure out where to remediate in the environment when the container accessible through the manifest misconfiguration no longer exists. See this blog for more information on how the lifecycle can make Kubernetes security actionable and accurate.
Get a KSOC demo for real-time KSPM
If you are ready to see real-time Kubernetes security in action, contact us for a demo.