Sign Up
Hero

How We Do Releases in Auth0’s New Private Platform

Designing for quality, reproducibility, and determinism.

TL;DR: Releasing individual services today is largely a solved problem. The complexity arises when we need to factor in the combinations of services, the configs and secrets necessary for a given service version, and the underlying infrastructure supporting those services while ensuring quality, reproducibility, and determinism.

What’s in a Service?

When we talk about services, we largely focus on the code — but other things we might need to account for are:

  • Config and Secrets
  • Database related migrations
  • Related infrastructure dependencies for a particular service version
  • Implicit dependencies with other services

Introducing the Concept of a Release Manifest

Given these problem constraints, we introduced a release manifest concept, which we define as a versioned JSON file describing a point-in-time representation of the entire Auth0 product stack, which includes: all service versions and the infrastructure version it was deployed on.

Configs and secrets are also versioned for every release manifest, which are snapshotted in our config/secrets storage.

Just to make it more concrete, here’s a hypothetical snippet:

{
    "version": "v202221.99.0",
    "services": {
        "productpage": {
            "entity": {
                "name": "productpage",
                "default_vcs_branch": "main",
                "stacks": ["STACK_AUTH0"]
            },
            "artifact": {
                "version": "1.464.0",
                "vcs_revision": "dc62808d3a75a814a0748827d74e0936669dfec9",
                "vcs_url": "https://github.com/auth0/productpage",
                "reference": "sample.jfrog.io/docker/productpage:1.464.0",
                "metadata": {
                    "build_number": "464",
                    "build_started": "2022-05-24T18:52:34.753+0000",
                    "build_url": "https://samplebuilds.auth0.net/job/productpage/job/main/464/"
                }
            }
        },
        "reviews": {
            "entity": {
                "name": "reviews",
                "default_vcs_branch": "main",
                "stacks": ["STACK_AUTH0"]
            },
            "artifact": {
                "version": "1.31.0",
                "vcs_revision": "dc62808d3a75a814a0748827d74e0936669dfec9",
                "vcs_url": "https://github.com/auth0/reviews",
                "reference": "sample.jfrog.io/docker/reviews:1.31.0",
                "metadata": {
                    "build_number": "31",
                    "build_started": "2022-05-24T18:52:34.753+0000",
                    "build_url": "https://samplebuilds.auth0.net/job/reviews/job/main/31/"
                }
            }
        }
    }
}

A couple of notes:

  • The version v202221.99.0 is based on a year/week scheme (e.g., year2022, week 21) and just a monotonic minor version which represents an atomic change.
  • In this example, we only have two services called productpage (versioned 1.464.0) and reviews (versioned 1.31.0).
  • This file is committed within our central git ops repository, with a matching tag v202221.99.0 where the underlying infrastructure as code is also versioned for that specific point in time.
  • Config/secrets aren’t versioned in git — but instead are stored separately in our config/secrets storage where we have the ability to snapshot them based on a specific version (in this case, all secrets/config have the same version v202221.99.0attached as metadata).

With this construct defined, we now have a baseline primitive for ensuring the properties we set out to solve initially:

  • Quality — we can easily run our suites of system tests against a given version to ensure that our features work as expected.
  • Reproducibility — we can create as many different spaces for testing or for onboarding new customers.
  • Determinism — we’re guaranteed we have the same combination of service versions, config/secrets, database migrations, and infrastructure.

A Note about Database Migrations

Database migrations themselves are a vast topic that is beyond the scope of this blog post, but the core properties we need to ensure are:

  • Forwards and backwards compatibility – that a new version can be read by old code, and new code can work with the old schema.
  • Idempotency – that any transient failures are wrapped in a transaction so we can retry the entire job if it fails. In addition, being able to run the same job once completed should be a no-op.

We utilize kubernetes jobs as the underlying primitive to run these database migrations and write them in such a way that retries during transient failures are ok — and that if we have to roll back, the previous service version will work with the new database migration.

Rollback is only within a reasonable constraint – in our case, the constraint is one major version away, which means a weekly version. So we only need to ensure backward/forward compatibility one major version away.

Deploying a Release Manifest-Version

Given that we have all the necessary information codified in a release manifest version, the next step is to deploy older versions to a new version. Taking our hypothetical example:

  • We have the current version v202221.99.0
  • We need to deploy to a new version v202222.105.0

Maintenance windows

Our private cloud customers can define a maintenance window where their space gets deployed every week. This means that these versions can only go out during that maintenance window — and that if anything potentially goes wrong, we have the ability to stop the release and Rollback to the prior version.

Promotion pipeline

Like any release pipeline, we define a hierarchical set of channels; let’s imagine the following hypothetical example:

stage
+- internal
   +- free
      +- enterprise
         +- private cloud dev
            +- private cloud prod

So, in this case, if the stage had the version v202221.99.0, it means we need to deploy it first to v202222.105.0 before we can deploy internal to v202222.105.0 and so forth.

Rolling back

If, for any reason, a deployment encounters an error, we can roll back by shifting traffic back to the previous stable cluster.

What’s Next?

The new release pipeline we’ve developed is only the beginning – and we’re planning to build more innovative tools for our platform in order to consistently deliver excellent service and reliability to our customers.

We are publishing a series of technical posts on how and why we replatformed. They’ll be linked at the bottom of this post as they publish and are also findable on our blog. If you have technical questions or ideas for future posts, please reach out to us via the Auth0 Community. (Our marketing team also tells us we would be remiss if we didn’t include a link to talk to Sales, so that is here, too.)