How to track state in your Kubernetes Operator
State can get very messy in a distributed environment like Kubernetes.
In simple environments, state machines are very adequate to define and track state. You quickly get to understand if your computation is misbehaving, and report progress to users or other programs.
But, In the context of an Operator, you can’t just pay attention to your own workload. You also need to be aware of external factors monitored by Kubernetes itself.
Here’s how to think about state when building your own Operator (based on lessons learnt from building our own).
Let’s get into it!
Spec is desired, status is actual
Let’s start with the basics. All Kubernetes objects define spec
and status
sub-objects to manage State. This includes custom resources monitored by your own Operator’s custom controller.
Here, spec
is the specification of the object where you declare the desired state of the resource. On the other hand, the status
is where Kubernetes provides visibility into the actual state of the resource.
For example, our Artillery Operator monitors a custom resource named LoadTest. As above, our LoadTest has a spec
field specifying the number of load tests to run in parallel. Our custom controller processes our load test. It creates the required parallel workers (Pods) to match the spec’s desired state therefore updating the actual state.
The key takeaway from this diagram is that observing state is a constant happenstance to ensure desired becomes actual.
But we need to understand how our custom resource is progressing. So, what exactly should we track in the status
sub-object?
State machines are rigid
The answer to the previous question is: do not use single fields with state machine like values to track state.
Kubernetes advocates for managing state using Conditions
in our status
sub-object. This gives us a less rigid and more ‘open-world’ perspective.
This makes sense. Processes in a distributed system interact in unforeseen ways creating unforeseen states. It’s hard to preemptively determine what these states will be.
Monitoring aggregate state
Let’s look at the example of a typical custom controller.
This controller creates and manages a few child objects. It is aware of state across all of these objects. And, it constantly examines its ‘aggregate’ state to inform its actual state.
We can define these aggregate
states using a state machine. As an example, let’s say we have the following states: initializing, running, completed, stoppedFailed
We’ll create a field called CurrentState
in our status
object, and update it with the current state. Our CurrentState
is now running
.
Unforeseen states
... Out of the blue a Kubernetes node running our workload fails!!
This drops a child object into an unforeseen (to us) state. Our custom controller’s ‘aggregate' state has now shifted. But, we continue to report it as running
to our users and downstream clients.
We need a fix! We update our custom controller with logic to better examine node failure for a child object. We introduce a new CurrentState
for this scenario: awaitingRestart
.
Some downstream clients will find this new controller state useful. We now have to inform them to monitor for this new state. All is well in the world, for now .
... What!?? out of the blue another an accounted for state happens!
Our CurrentState
can’t explain the issue. So, we update controller logic, add an adequate state for ... etc.. etc..
You can see how this state machine based approach quickly becomes rigid.
Conditions
Conditions 👀, provide a richer set of information for a controller’s actual state over time. You can think of it as a controller’s personal changelog.
Conditions are present in all Kubernetes objects. Here’s an example of a Deployment. Have a read I’ll wait... 👀
If our typical custom controller, from the last section, had a Deployment child object that exhibited the following conditions.
status:
availableReplicas: 2
conditions:
- lastTransitionTime: 2016-10-04T12:25:39Z
lastUpdateTime: 2016-10-04T12:25:39Z
message: Replica set "nginx-deployment-4262182780" is progressing.
reason: ReplicaSetUpdated
status: 'True'
type: Progressing
- lastTransitionTime: 2016-10-04T12:25:42Z
lastUpdateTime: 2016-10-04T12:25:42Z
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: 'True'
type: Available
It can use the following logic to figure out its own actual state,
- Go through the Deployment’s
.status.conditions
- Find the latest
.status.conditions[..].type
and it’s.status.conditions[..].status
- Weigh if the other Deployment
.status.conditions
had any alarming statuses
Here, it determines the Deployment is still progressing and not yet completed. Then it can add a condition with a ‘Progress’ like type to its Conditions list.
Creating your own Conditions
So, how do we create conditions for our own custom controller?
We’ve just seen an example of a Deployment’s Conditions, specifically how open ended they are. E.g type: Progressing
may mean either of progressing and complete.
As a general rule of thumb,
- We have a smaller set of types
.status.conditions[..].type
that explain general behaviour. - Unforeseen states, can be easily signalled using the
Unknown
.status.conditions[..].status
value. - Failure for a condition type
.status.conditions[..].type
can be explained usingFalse
as.status.conditions[..].status
value with an informative.status.conditions[..].reason
.
Let’s checkout an example and see how these tips play out.
Conditions by example
If you recall, earlier, we used the Artillery Operator to explore spec
and status
sub-objects. Let’s get into the details of how it makes use of Conditions to manage state.
The Artillery Operator uses a LoadTest custom resource to spec out a load test run. The spec includes the number of workers to run in parallel. This information is used to generate a Kubernetes Job and Pods to run the actual load test workloads.
The Job is a key component here. Tracking our aggregate state relies on the progress of the Job.
We also observed that,
- There are three statuses we care about based on a Job’s
status
...
Namely, LoadTestInactive
, LoadTestActive
and LoadTestCompleted
.
- We should add extra fields to
status
...
These fields will give us a fine grained view of how many workers are active, succeeded or failed.
// The number of actively runningLoadTest worker pods.
Active int32 `json:"active,omitempty"`
// The number ofLoadTest worker pods which reached phaseSucceeded.
Succeeded int32 `json:"succeeded,omitempty"`
// The number ofLoadTest worker pods which reached phaseFailed.
Failed int32 `json:"failed,omitempty"`
Using our Conditions rule of thumb, two Condition types are required:
Progressing
, withTrue
,False
orUnknown
as a value.Completed
, withTrue
,False
orUnknown
as a value.
These easily explained the progress and completed states. But, how do explain a Load Test has failed?
In our distributed load testing domain, a load test always completes. And, will only flag failed workers. Restarting a failed worker/Pod messes with the load test metrics, so we avoid it at all costs.
So, for the controller, the Completed
condition set to true
was more than enough. Failed workers were flagged in the status
field (using a failed
field to track failed count).
Other implementations may treat the failure condition differently (e.g. Deployment uses the Progressing Condition type with value false).
Our Conditions ensure users can track a Load Test’s observed status to infer progress. While helping clients monitor the conditions that matter to them.
Feel free to check the full Conditions implementation. 👀
Conditions all the way down
The more you deal with Kubernetes object to track any form of state, the more you deal with Conditions. They’re everywhere.
They help create a simple reactive system that is open to change. Hopefully, this articles gives a good understanding of how to get started.