12 November 2020


How to create a custom resource with Kubernetes Operator

17 minutes reading

How to create a custom resource with Kubernetes Operator

While developing projects on the Kubernetes platform I came across an interesting problem. I had quite a few scripts that ran in containers and needed to be triggered only once on every node in my Kubernetes cluster. This could not be solved using default Kubernetes resources such as DaemonSet and Job. So I decided to write my own resource using Kubernetes Operator Framework. How I went about it is the subject of this blog post.

When I confronted this problem, my first thought was to use a DaemonSet resource that utilizes initContainers and then starts a dummy busybox container running tail -f /dev/null or another command that does nothing. However, after analyzing the question more thoroughly, I realized that it would be very problematic on a production setup with many such DaemonSets. Each dummy container eats up resources to execute a useless task. Moreover, from the perspective of the cluster administrator, such a DaemonSet seems exactly the same as other legitimate services. Should an error occur, it may be confusing and require a deep dive into running resources.

My second idea was to use a simple Job with a huge number of Completions and Parallelism. Using anti-affinity specification to ensure that only one Pod will run on a single node at a time, I would achieve a simulated effect of the DaemonSet’s behavior. Though a new node would appear in the cluster, a new Pod would be scheduled.

Yet this idea, with a fixed number of Completions, seemed a bit amateurish. Even if I defined it to be 1000—what if a 1001st node were to appear in the cluster? Further, from the perspective of the cluster administrator, such a Job seems like a defective resource that cannot finish all Completions.

application networking in Kubernetes

Creating a custom resource in Kubernetes

But if I could go a bit more low-level and create a resource that merges the functionality of the DaemonSet and the Job? I went online to find out whether someone has already created such a resource. But all I turned up were a few Issues on Kubernetes GitHub posted by people who’d had the same problem. For example, in this issue the idea of adding such a DaemonJob to the official Kubernetes resources has been discussed for over two years, but a conclusion has remained elusive. Long story short, I would be forced to create such a resource by myself. I chose to use Operator Framework and code a resource that would be defined in a very similar style as a standard Kubernetes Job resource but would automatically schedule a pod on every node, à la DaemonSet. Before I go further, let’s have a brief overview of the Kubernetes operator architecture.

An overview of Kubernetes operator architecture

A Kubernetes operator has two main components: Custom Resource Definition (CRD) and Operator Controller. Custom Resource Definition is a standard Kubernetes resource that allows users to extend Kubernetes API with new resources. However, this only enables users to apply manifests of newly created resources but will have no effect on the cluster. This is because Kubernetes architecture contains the Kubernetes controller which has all the logic behind every resource in the cluster. Since a user-defined resource has not been coded in the built-in Kubernetes controller, the user has to provide its controller separately. This is where the operator controller comes up, usually deployed as a separate pod on the cluster with code that contains logic on how to manage custom resources.

The controller has the main parts which are important for every operator developer. The first is the definition of what the controller watches. It defines changes in specific resources or events that will trigger the operator’s control logic. For example, if a custom resource’s work is based on information provided in ConfigMap, then the operator should watch changes of ConfigMap to trigger code on every change and adjust the resource to the new configuration. The other important part is the Reconcile loop. This is in fact a function that contains the whole logic behind the operator and is triggered by changes in the resources being watched.

How to start working with Kubernetes Operator Framework

Development of such a project code template may seem a tremendous challenge for a developer. However, Operator Framework provides a clever CLI tool called operator-sdk, which will do most of the job. Whole code infrastructure may be generated with just two commands and then the developer just edits specific structures or functions to adjust the resource-specific logic.

Operator Framework allows users to create multiple Kubernetes custom resources in the scope of a single operator. In such a case, every resource is managed by a single controller, which can be used to build more complicated projects. These can often have multiple components, with each deployed as a separate custom resource. In order to initialize an empty operator, you may just run simply operator-sdk init (docs) command with proper flags for our project and the whole basic architecture will be created.

Having initialized the operator, you can now add code for specific resources. For the problem described in this blogpost, I will create just one resource. The new resource can be added with the command operator-sdk create api (docs), which will generate API code for the specific resource described with passed parameters that are additional flags added after a CLI command that specify what should exactly happen after running this command. This means the developer will be able to specify only the structure of a custom resource, but will not have code generated for the controller part. To additionally generate controller code it’s necessary to provide the flag --controller=True  when creating the API. Why does operator-sdk not do that by default? In more advanced projects that periodically release new versions of the product, the API may change and leave old manifests not working—and all customers using the old manifests facing the need to migrate to the new version. However, because a new API can be created, both the old and new API of a resource are supported, and both versions can be managed with a single controller, at least until legacy support is no longer required.

Having a ready project structure for development of a new custom resource, you can start writing the code. All definitions of API should be written in files with names ending with _types.go located under api/<API version> directory. And all controllers code is located under controllers/ directory in files with names ending with _controller.go. These are the main two file types where developers should edit code. All other files are generated automatically. When a developer finishes editing code of the custom resource then all he has to do is run the make manifests command and the operator code will be ready to be installed on the cluster(docs).

DeamonJob step-by-step

Given all this, I started implementing the idea of a DaemonJob. The API was the easy part—just remove the Completions and Parallelism fields from the Job API and leave everything as is. The API code in the Kubernetes resources is written as Go structs and most of it looks the same (code). From the user’s perspective, a sample manifest to be applied on the cluster would look like this:

apiVersion: dj.dysproz.io/v1
kind: DaemonJob
  name: daemonjob-sample
       - name: test-job
         image: busybox
          - sleep
          - "20"
        app: v1
      restartPolicy: OnFailure

In this example, on every node which has label app=v1, a busybox container will be created and will run a sleep command for 20 seconds. As you may be able to see, the above manifest is very similar to one users would create for a simple Kubernetes Job resource.

However, the logic behind the controller for such a resource presented a more daunting challenge (code).

It required, first of all, that the resources to watch be defined. In this case, the operator would have to look just for changes on nodes which may seem fairly easy.  This brings me to the first problem. Kubernetes operator code is usually designed to create resources that are owned by a single custom resource and watch only these resources. For example, if a custom resource creates ConfigMap with configuration and watches for ConfigMap, then it should look only for changes on ConfigMaps that are owned by this custom resource.

When a watched resource triggers the Reconcile loop of custom resources, an argument request is passed containing two fields: custom resource name and namespace. Although nodes in the Kubernetes cluster are represented as regular resources, there is no built-in function to watch specific resources without ownership of a custom resource. As a result, the Reconcile loop would be triggered with the request name pointing to a node that triggered action rather than to a specific custom resource.

To keep up with the original design of developing operators, it is necessary to build a custom method that will trigger the Reconcile loop with a proper request name for every DaemonJob resource when any node changes its state. This can be achieved with the following code:

func (r *DaemonJobReconciler) SetupWithManager(mgr ctrl.Manager) error {
    if err := ctrl.NewControllerManagedBy(mgr).
        Complete(r); err != nil {
        return err
    if err := ctrl.NewControllerManagedBy(mgr).
        Watches(&source.Kind{Type: &corev1.Node{}}, &handler.EnqueueRequestsFromMapFunc{
            ToRequests: handler.ToRequestsFunc(func(nodeObject handler.MapObject) []reconcile.Request {
                var djObjects djv1.DaemonJobList
                _ = mgr.GetClient().List(context.TODO(), &djObjects)
                var requests = []reconcile.Request{}
                for _, djObject := range djObjects.Items {
                    requests = append(requests, reconcile.Request{
                        NamespacedName: types.NamespacedName{
                            Name:      djObject.Name,
                            Namespace: djObject.Namespace,
                return requests