business

Multi-Subscriber (Public Cloud) Deployment on Converged Architecture

Efficiency on a robust foundation: Exploring multi-subscriber (Public Cloud) deployment on Auth0's converged platform architecture.

TL;DR: Auth0’s SaaS services are available via two modes of cloud deployment:

  • Multi-subscriber (shared infrastructure, a.k.a. Public Cloud)
  • Single subscriber (dedicated infrastructure, a.k.a. Private Cloud).

These offerings now have a converged platform architecture that increases business agility, developer productivity, scalability, reliability, and security by using industry-leading tools and fully automated operations.

This article continues a series of earlier ones focused on Private Cloud deployments. Unlike the single-subscriber Private Cloud, the Public Cloud hosts thousands of subscribers. In multi-subscriber deployments, data isolation, security, and elimination of noisy neighbor problems are paramount.

Public Cloud leverages the converged architecture composed of three tiers - the Central tier, Points of Presence (PoPs), and Spaces. The Central tier and PoPs are for control, operations, and shared services. Spaces are environments that provide customer identity services as per customers’ needs.

We leverage Kubernetes for lifecycle and runtime management of Auth0’s cloud-native containerized solution. Auth0 is composed of stateless services that are easier to scale. We use MongoDB, PostgreSQL, and Redis as required by the situation. We ensure a robust and performant solution through network isolation and extensive use of edge networks for DDoS protection, rate limiting, and low-latency DNS resolution (including custom domains). The converged architecture's intra-region High Availability (HA) and inter-region geo-failover characteristics help us meet the availability SLAs of Public Cloud deployments.

Auth0 Services on Multi-Subscriber (Public Cloud) and Single-Subscriber (Private Cloud) Deployments

Auth0 is a Software-as-a-Service (SaaS) solution with two deployment models - Public Cloud and Private Cloud. The Public Cloud is a multi-subscriber environment where resources are shared between customers and is available in the United States, Europe, Australia, Japan, and now in the United Kingdom. Private Cloud deployments, on the other hand, are single-subscriber environments that provide customers with dedicated infrastructure and can be deployed in most regions across Amazon Web Services (AWS) and Microsoft Azure.

The Public Cloud offering is deployed on AWS and addresses the needs of a vast majority of customers. However, customers may choose Private Cloud for a variety of reasons, such as:

  • Geographical requirements that dictate deploying in an area not served by Public Cloud (e.g. for data residency and/or latency reduction)
  • Industry-specific needs (e.g., PCI compliance)
  • Preference for Microsoft Azure instead of AWS
  • Higher transaction volume than allowed on Public Cloud

Auth0 Services on Multi-Subscriber (Public Cloud)

Why Converged Architecture?

The converged architecture allows us to deploy and manage both Public and Private Cloud offerings on the same underpinning. This architecture is built on the following premises:

  • Multi-cloud: AWS and Azure cloud support through a common interface
  • Container orchestrated: use of container images instead of installing packages
  • Stateless: scalable and replaceable by design
  • Immutable: run only trusted code and configuration
  • Fully automated: provisioning, upgrades, and decommissioning are done by machines
  • GitOps: enhanced DevOps, push code and safely reproduce it thousands of times

Architecture

The benefits of a single underlying architecture for developing, deploying, and operating services across different environments are manifold. The primary reason is the ability to offer new capabilities to customers faster and at a lower expense. The architecture also allows us to ensure feature parity across different deployment environments seamlessly.

The converged architecture helps in focusing the organization’s energy on a common infrastructure that yields higher developer productivity/velocity. It results in improved platform scalability to address various scaling demands from both multi-subscriber and single-subscriber deployments. It provides a constructive space for assessment and adoption of security practices that might be perceived as unique to one deployment environment type, but have practical implications on the other as well. It also helps with reliability refinements and enhancements across the board.

Besides optimizing engineering resources, a converged architecture has several other internal benefits through unified automation, toolsets, and processes. It has expedited identification and realization of infrastructure and operational cost optimizations across the board, which helps deliver good value to our customers.

Deployment Characteristics of Multi-Subscriber Services

As a reminder, the Public Cloud is designed to host thousands of subscribers over shared resources, whereas a Private Cloud deployment is designed for a single subscriber.

Ensuring optimal performance and security for the different Public Cloud subscribers is paramount. While the underlying compute, storage, database, networking, domain name, and other resources are shared in a multi-subscriber deployment, this solution ensures each subscriber is isolated within this shared environment.

The database and application workloads are partitioned to ensure an appropriate level of data isolation and security. The Auth0 Public Cloud combines pooled partitioning (i.e., partitioning of data by subscriber identifier) and on-demand subscriber-specific workload instances. The architecture is refined continuously to eliminate “noisy neighbor” problems.

Public Cloud Platform Infrastructure Architecture

Tiers - Central, PoPs, and Spaces

Once again, the converged architecture has three tiers - Central, Points of Presence (PoPs), and Spaces. Besides these, the Edge network for DNS and DDoS protection, and Auth0 HQ based systems for analytics and operations complete the picture.

Tiers - Central, PoPs, and Spaces

It’s important to note that the solution has been designed with isolated fault domains; the control plane and data plane separation allows the central control plane to be offline/under maintenance without impacting customers’ authentication flows.

  • Central: A globally unique component that contains the platform control plane and other centralized global infrastructure. The control plane manages the creation of Public Cloud environments and keeps them updated with continuous (daily) releases.
  • PoPs: An intermediate point of infrastructure pinned to a specific cloud region (e.g., AWS us-east-1). The PoP hosts internal visibility tools and securely proxies messages from the central down to the regional Public Cloud environments.
  • Space: The Public Cloud environment is located ‘beneath’ a PoP in a particular AWS region. These spaces have cross-region geo-failover capabilities to ensure resiliency.

Aspects - control and runtime, scalability, datastores, networking, and reliability

When Auth0 took on the converged platform project, it started as a small team, initially focused on our single-subscriber (Private Cloud) offerings. This team comprised a cross-section of our platform teams with expertise ranging from Kubernetes to API, tooling, infrastructure, and datastores. They were focused on developing a proof of concept for the converged platform.

As we learned from our experiments — and as the converged platform evolved and matured — we launched single-subscriber environments for general availability. You may read more about this in a series of posts we shared outlining the technical, security, and architectural decisions we made:

We continued the logical progression with the converged platform, and adopted it for Public Cloud. Behind the scenes, the team continued to grow, and eventually, as we began to gain more and more traction from the converged platform, we organized the team into specialties to best support growth.

  • Runtime - Heavily focused on our Terraform and Kubernetes layers. Ensuring that the core of the platform functions and that automation has the flexibility to meet the needs of product delivery teams.
  • Control - Owners of our controlling API and platform CLI. The tooling that is used for operating the platform in all cases both normal and incidents.
  • Datastores - Focused on our data storage technologies and interacting with our product delivery teams and the most efficient way to utilize the available stores.
  • Networking - Owners of our edge technologies and supporting the myriad of ways in which our customers leverage the edge.
  • Sizing - Partners with other internal teams to establish the performance characteristics of our offerings and additionally adds improvements to our platform including auto-scaling.
  • Services - Owners of tooling and services leveraged by product delivery in their development. This includes abstractions that allow for easier access to the underlying platform.
  • Data - Responsible for ensuring the data flow from the individual environments to the regional Points of Presence (PoPs) and beyond. This includes application logs for support as well as abstracted metrics.

In order to get the technology behind a solution right, it's critical to fully enable the people behind the technology to co-create innovative solutions. At Okta, we cultivate a BLOC culture, which stands for “Builder, Learner, Owner, and Collaborator.” These values foster a high-functioning, well-oiled machinery that can take on big and small challenges effectively and efficiently.

BLOC culture

With the different team specializations, collaboration becomes all the more important. At Auth0, we value documentation not only for informing but also for decision-making. Our Engineering teams follow a simple mantra for collaboration: "Write. It. Down." We use Request For Comment (RFC) documents to record architectural designs and Decision Record documents to discuss and align ideas across teams. These documents also provide important context for new team members about what we're building and how we arrived at the decisions we've made and served as an important knowledge base as we began onboarding service owners to the new converged platform. These core tenets have become critical for sustaining the growth of the platform and introducing new capabilities over time.

Reliability

The converged architecture provides a highly reliable and available solution leveraging not only the High Availability (HA) constructs across three Availability Zones within an AWS region, but also geo-failover capability between Active and Failover regions.

Reliability

The geo-failover regions are always synchronized from the datastores’ perspective, with demanding RPO/RTO and 99.99% availability SLA targets. Once the failover decision is made, a fully automated process is initiated for the failover to take effect.

Reliability

In the event of a multi-region failure (an extremely rare event), we can perform an automated restore from the last backup.

The historical availability metrics are a testament to the robustness of the Auth0 solution. We continue to meet our targets in Private Cloud with the converged architecture and expect to continue meeting or exceeding these targets with Public Cloud deployments on this architecture.

Private Cloud

What Next?

Auth0’s Public Cloud offering continues to be boosted across different facets. The availability of Public Cloud across geographies will continue to expand. The launch of the UK Public Cloud region in April 2023 is the latest example, providing yet another regional point of presence for customers with data residency and latency sensitivity needs. Also, support for a Public Performance tier in H2 2023 caters to customers’ higher performance needs on Public Cloud.

We look forward to continuing to deliver new and enhanced customer identity capabilities, ease of use, platform resilience, scalability, and security in our Private Cloud and Public Cloud offerings through the new converged architecture.