Crossplane Components

Following my Crossplane introduction post from last week, this post is an attempt to outline some of the components you will interact with if you decide to roll out a Crossplane-based platform in your org. We’ll start with the basics and work our way up to some of the less obvious (and more consequential) features available to you while writing compositions. Let’s get started.

Providers

You can think of Providers the same way you would a Kubernetes operator. They are workloads that run in your cluster and provide custom resource types related to specific cloud platforms. These allow you to create and manage infrastructure declaratively. There are providers for major cloud platforms that we expect like GCP, AWS, and Github. There are also providers for managing resources related to specific apps like Kafka, Grafana, or ArgoCD.

While these providers offer a lot of benefit, it should be noted that (as of Crossplane v2), these are entirely optional. If you are only focused on building compositions for Kubernetes resources provided by other operators like cert-manager or CloudNativePG, or using Kubernetes default resources like Deployments and ServiceAccounts, you can skip these entirely.

What providers do enable though is an opportunity to holistically manage your entire platform. Everything from a Kubernetes Deployment, to your company’s SSO configuration, to VPC peering connections can all be treated exactly the same way. You apply your resources to a Kubernetes cluster and the controllers in the cluster handle the rest. Kubernetes becomes a universal control plane for everything you need to manage.

So how do they work? Well you start by installing a provider like anything else, by writing a Kubernetes manifest and applying it to a cluster where Crossplane is installed. Here is an example of how to install the Github provider:

apiVersion: pkg.crossplane.io/v1
kind: Provider
metadata:
  name: crossplane-contrib-provider-upjet-github
spec:
  package: xpkg.upbound.io/crossplane-contrib/provider-upjet-github:v0.18.7

When you apply this manifest, a few notable things will happen:

First, a number of CRDs will be installed. These are the resource types that the Provider provides. You can see a list of which resources are included by checking the Upbound Marketplace. Each provider has a list of Managed Resources (MRs) which it offers. Here is the Github page.

Second, a Deployment is created. This is the workload that provides the instruction for what to do when a new Managed Resource is applied to the cluster.

My nodes all have taints on them. I can’t get the pods to schedule.

No worries. The Crossplane developers thought of this. You resolve this issue by creating a DeploymentRuntimeConfig manifest and applying that to the cluster as well. This is essentially a patch against the Deployment that the Provider creates. You can add your tolerations here like this:

apiVersion: pkg.crossplane.io/v1beta1
kind: DeploymentRuntimeConfig
metadata:
  name: tolerate-operator-nodes
spec:
  deploymentTemplate:
    spec:
      template:
        spec:
          nodeSelector: "operator"
          tolerations:
            - key: "operator"
              operator: "Exists"
              effect: "NoSchedule"

Ok, the Provider is installed but the repositories in my organization are private. How can I grant this provider access to my Github organization?

You can do this by writing a ProviderConfig manifest and applying that to the cluster. ProviderConfigs are how you provide credentials to the Provider. You can have multiple ProviderConfigs in place for a single Provider. For example, imagine you have different AWS accounts for dev, staging, and production and each requires their own authentication method. You can create ProviderConfigs for each of them and reference them in your MRs like this:

apiVersion: s3.aws.m.upbound.io/v1beta1
kind: Bucket
metadata:
  name: example-production-bucket
spec:
  forProvider:
    region: us-east-1
  providerConfigRef:
    kind: ProviderConfig
    name: production

Functions and Compositions

I covered the basics of Compositions and Functions in my previous post but I want to expand on them a bit more here. One very important aspect that involves both compositions and functions needs to be explored: composition pipelines.

The composition pipeline is honestly one of my favorite features of Crossplane. Basically you define steps for how your resources should be composed and subsequent steps in the pipeline build on work accomplished in the previous steps.

Simple enough, right? It is but it unlocks huge potential that you don’t get anywhere else really.

On the surface level, it looks like it behaves in a similar way to a CI pipeline. It differs from conventional CI pipelines though in that conventional pipelines are general purpose whereas the goal of composition pipelines in Crossplane is narrowed to primarily compiling resources you want applied to the Kubernetes cluster.

So how do they work?

First, we need to understand that a composition pipeline is a collection of functions executed in the order that they appear in the composition. The output of one function is submitted to the next function as input. Think of this behavior like the | character in bash. Information is passed from one function in the pipeline to the next until all of your resource manifests are generated.

What it passes includes (but is not limited to) the following:

The observed resource state. This is the XR resource that Crossplane saw in the Kubernetes cluster that needs to be rendered. It contains the complete contents of the resource’s yaml body. Think of this as the primary input to your composition.
The desired state. These are our output resources. It’s a list of named Kubernetes resources that we expect to apply to the cluster. Whatever is in the desired state when the pipeline is finished is what gets applied to the Kubernetes cluster.
The pipeline context. This is a special attribute that functions can write to if they want. Because it is passed along through each step in a pipeline, one function can write data to it, a subsequent function can read the content of the pipeline context and use that data to inform decisions about how resources are rendered before being put into the desired state.

The pipeline context is one of the most useful attributes about compositions. A classic example of this is function-extra-resources. What this function does is take a request for a Kubernetes resource as input, fetches the details about that resource, and puts it into the pipeline context so that it can be used by the functions further down in the pipeline.

I’m having trouble visualizing this. Can you give an example?

Imagine you want to standardize the tags on your AWS resources so that they are always associated with a specific team. You could create a ConfigMap with the key/value pairs you want in your tags, one ConfigMap for each team. They could be used like this:

Your ConfigMap:

apiVersion: v1
kind: ConfigMap
metadata:
  name: aws-tags-payment-workflow
  labels:
    team: payment-workflow
data:
  division: engineering
  department: backend
  team: payment-workflow
  jiraBoard: PAY
  slackChannel: "#payments"
  email: "[email protected]"

Your XR:

apiVersion: example.crossplane.io/v1
kind: Database
metadata:
  name: my-database
  labels:
    team: payment-workflow
spec:
  ...

Your composition:

apiVersion: apiextensions.crossplane.io/v1
kind: Composition
metadata:
  name: database
spec:
  compositeTypeRef:
    apiVersion: crossplane.my-company.com/v1alpha1
    kind: Database
  mode: Pipeline
  pipeline:
    - step: fetch-tag-details
      functionRef:
        name: function-extra-resources
      input:
        apiVersion: extra-resources.fn.crossplane.io/v1beta1
        kind: Input
        spec:
          extraResources:
            - apiVersion: v1
              kind: ConfigMap
              type: Selector
              selector:
                matchLabels:
                  # We want any ConfigMap with a 'team' label that matches
                  # 'metadata.labels.team' on our XR.
                  - type: FromCompositeFieldPath
                    # 'key' is the label on the ConfigMap
                    key: team 
                    # This reads the value from our XR
                    valueFromFieldPath: metadata.labels.team
              # This is the key in the pipeline context we want to put our
              # discovered data into
              into: discoveredTags

    - step: database-template
      functionRef:
        name: function-cue
      input:
        apiVersion: function-cue/v1
        kind: CueFunctionInput
        source: Inline
        script: |
          #request: {...}

          // The observed state is the body of the XR Crossplane found in the
          // cluster
          xr: #request.observed.composite.resource

          // The context was populated by function-extra-resources in the
          // previous step
          discoveredTags: #request.context["apiextensions.crossplane.io/extra-resources"].discoveredTags

          // This illustrates the response as "desired" resources. This is our
          // output.
          response: desired: resources: {

            // Each resource needs a unique name so that the composition can
            // keep track of it. We call this one "my-database"
            "my-database": resource: {
              apiVersion: "rds.aws.m.upbound.io/v1beta1"
              kind: "Cluster"
              metadata: {
                name: xr.metadata.name
                namespace: xr.metadata.namespace
              }
              spec: {
                forProvider: {
                  // The tags are applied to the resource automatically
                  tags: discoveredTags
                }
              }
            }

          }

This is a simple example with one resource being generated. Imagine if you composition was rendering 20 resources or if you had 30 teams that needed to be tracked.

This pattern can be extended to almost anything really. Think about automatically attaching IAM Policies to specific IAM Role resources, providerConfigRef being automatically applied to resources, or a Github Organization is created for every AWS Account that exists.

If tagging resources like this were mandatory, this pattern would provide a standardized resource type with a simplified spec that any engineer can understand.

Developers are happy because they wouldn’t need to worry about tracking all of those tags. They would be able to create resources themselves without having to get the Ops team involved.
The SRE team is happy because it’s easy to predict the state of a cloud resource. In the middle of an incident, they would know exactly who needed to be contacted because the contact details are attached directly to the resource. No context switching necessary.
The platform team is happy because the standardized nature of the resources mean it’s easier to introduce more automations that increase velocity even further.
Management is happy because it is easy to distinguish how much each team is spending on their infrastructure.

This is a hugely powerful pattern and it’s simply one example. There are many more functions available and there are SDKs if you’d like to build your own.

In my next post I’ll be documenting my experience building a composition function of my own and how I see this pattern affecting some of the most common elements that exist in the ops field today.