TL;DR: Auth0 replatformed our private cloud offering into a multi-cloud Kubernetes architecture. This large architectural change brings with it new threats and new security challenges. We took the opportunity to design and integrate security into the platform from the start, as opposed to "bolting it on" at the end.
In this post, we'll discuss how we approached securely replatforming the Auth0 Identity Platform into a large, multi-cloud architecture. We'll walk through the risks that most concern Auth0's Security team and the threats that could result in those risks becoming real, tangible, negative outcomes. After that, we'll discuss the challenges our team encountered while addressing those threats in a multi-cloud environment alongside the principles we applied to address them. Finally, we'll review the controls Auth0 put in place to mitigate the identified threats.
If you haven't read the previous posts yet, A Technical Primer of Auth0's New Private Cloud Platform and The Architect's View of Auth0's New Private Cloud Platform, I encourage you to check them out first. They provide foundational knowledge and context which will help you better understand this post.
Understanding What Risks Affect Us
At the highest level, Auth0's Security team is responsible for identifying, assessing, and mitigating risks to our business and, in effect, our customers. It's our job to reduce the impact and/or likelihood of undesirable business outcomes caused by security events. Specifically, we're interested in mitigating security and privacy risks, such as those involving:
- Product/platform compromises: bad actors gaining control of a product or platform component
- Customer or Auth0 data loss: unintended deletion or unauthorized access to customer or Auth0 data
- Service disruptions and outages: disrupting the Auth0 service and causing an outage for our customers
There are more risks than the three listed above and more nuance as well, but we'll use them for brevity. These high-level risks help us enumerate our list of threats.
Identifying How the Platform Will be Attacked
This new Kubernetes-native, multi-cloud architecture is large and complex, which makes it challenging to secure. Our approach is to start by enumerating the areas of the architecture most likely to be targeted and how they can be attacked. This process, called threat modeling, helps us determine where to focus our resources and implement countermeasures (also known as security controls). These "areas to focus on," or threats, are circumstances or events that could cause one of the risks highlighted above to occur.
The threat modeling process results in a long list of threats of varying severity. For the purposes of this post, we'll focus on a short list of significant ones:
- Employee account takeover: an attacker compromises or gains access to employee credentials and utilizes them to access Auth0 systems
- Malicious insiders: an internal employee abuses existing authorized access to Auth0 systems
- Service or application compromise: an attacker gains control of an Auth0 product or platform service, third-party managed service, or a cloud provider service
- Software supply chain compromise: an attacker tampers with or exploits misconfigurations/vulnerabilities in software dependencies used by the Auth0 product or platform
This list of threats informs what secure design patterns and security controls we deploy into the platform's architecture.
Securing Multi-Cloud Architecture is Hard
Now that we have defined the list of threats we're protecting against (the "what"), we can start determining the design changes, secure design patterns, and security controls that will address those threats (the "how"). But first, before jumping into that, let's take a step back and understand the challenges with securing a platform that utilizes multiple cloud providers. In Auth0's case, we use both Amazon Web Services (AWS) and Microsoft Azure.
Using multiple cloud providers increases the architecture's overall complexity. Each cloud provider has similar general capabilities such as computing, networking, storage, and logging. However, each cloud provider implements these capabilities in their own ways. What we end up with are multiple implementations of the same core capability, like virtual machines or blob storage. These nuances between cloud providers introduce complexities that are challenging to secure.
Let's use an example to illustrate these differences. The Auth0 product requires blob storage, so let's compare AWS S3 and Azure Storage Accounts and only focus on the different access methods.
Another example is AWS Security Groups (SG) vs. Azure Network Security Groups (NSG) and how each handles rule evaluation. AWS SG rules are additive (i.e., allow the superset of all applicable rules), while Azure NSG rules are assigned a priority value and evaluated in that order. These are only two examples of the many differences between AWS and Azure we accounted for in the new platform's architectural design.
Larger attack surface
Using multiple clouds means potentially securing multiple, similar services for each cloud provider, e.g., AWS S3 and Azure Storage Accounts. The complexity mentioned above and an increased number of services result in a larger attack surface. Today, for instance, we need to secure two cloud service providers (AWS and Azure), and that means securing two sets of each cloud-specific service required to run Auth0. It also means our Detection and Response team needs to ingest and monitor multiple sets of cloud-specific log sources to properly monitor and investigate the activity. That's a large surface area to secure.
Communication between services within the same cloud is straightforward. We simply need to use the cloud provider's own mechanisms, and it should "just work." However, there are different points in our new platform's architecture that require some form of communication between AWS and Azure. Network connectivity, for example, needs to be established between Azure Points-of-Presence (PoPs) and Auth0 Central (which runs in AWS). We don't have the ability to use either cloud's PrivateLink service in this scenario (which would be preferred). In another example, Azure Spaces need to upload usage metrics into Auth0 Central (AWS). In single cloud architecture, we could simply use Azure RBAC configured with the proper role assignments. However, that's not possible in this scenario; the Azure Kubernetes Service (AKS) cluster/pod will need to authenticate using AWS S3 authentication methods.
Addressing Multi-Cloud Challenges
So far, we reviewed the risks Auth0 is most worried about, enumerated a short list of threats to help guide our security decision-making, and discussed some of the challenges Auth0 experienced securing a multi-cloud architecture. Next, let's walk through the principles we applied to determine how we address those threats in the context of those multi-cloud challenges.
View all security-related decisions through a risk-based and threat-centric lens.
That means all secure design decisions or security controls implemented were directly related to the risk and threats we enumerated during the previous sections. Prioritization is key, and using this lens influences where we need to spend available resources and which countermeasures would be most effective.
Apply secure design principles and integrate security into the platform early and often.
The Auth0 Security team is involved throughout the product life cycle, from ideation to design to build to operation and finally to deprecation. We are heavily involved in the design and build phases. This is important because it's easiest to make architectural changes before the technology is built.
Establish clear, enforceable trust boundaries.
These trust boundaries inform how we implement various countermeasures such as access controls. They also encapsulate areas of risk, making them easier to manage. For example, our development and production environments are completely isolated and separate from one another. Operators cannot use development access to affect any part of the production environment.
Balance cloud-native and cloud-agnostic capabilities.
This is a tricky one because each has its own benefits and challenges. Cloud-native capabilities (e.g., AWS managed databases) are tightly integrated with the cloud provider, which means you'll get those integrations — like GuardDuty or Microsoft Defender — out-of-the-box. On the other hand, cloud-native services are tightly coupled to their cloud, and deploying them for both AWS and Azure increases the complexity and attack surface (as we discussed above). Compare that to cloud-agnostic capabilities, which can provide a single pattern and enable a simpler architecture with less attack surface. However, integrating with cloud provider services tends to require more work or "glue" to stitch them together. For example, our Auth0 product consists of Kubernetes workloads (cloud agnostic). We use managed Kubernetes services (AKS and EKS), and it's very easy to use the same services across both clouds. That means less complexity (i.e., one change to the Auth0 product will be applied to both clouds) and less attack surface (i.e., one service to secure).
Selecting the Right Set of Security Capabilities
Finally, let's review the various security capabilities Auth0 implemented into the new platform to address the threats and multi-cloud challenges mentioned above. In addition to our own security capabilities, Auth0 takes advantage of the cloud's shared responsibility model. The shared responsibility model allows us to focus on securing other areas of the platform instead of worrying about the underlying cloud service provider infrastructure.
All Auth0 Spaces, PoPs, Central, and Control Plane technology stacks are isolated into their own AWS Account or Azure Subscription. This provides clean and clear resource isolation, especially for single-tenant Spaces. We used AWS Organizations to create an Organization Unit (OU) hierarchy based on environment and functionality and add each account to a specific OU. We used Azure Management Groups to create a similar hierarchy for Azure Subscriptions. The hierarchies we created enable scalable, granular access controls and policy enforcement; we grant access or apply policies (Azure Policy and AWS Service Control Policies) to specific Management Groups / OUs or Spaces. For AWS, we grant access based on OU conditions and assign SCPs to OUs. For Azure, we assign roles and policies scoped to specific Management Groups. The access and policies are then inherited by the intended accounts or subscriptions.
Identity and Access Management
Auth0 was built to secure identities, so we leverage Auth0 + Okta to secure authentication and authorization to our platform tooling. We manage all cloud providers and support application access using a least privileged RBAC model. We also require Multi-factor authentication (MFA) for all operator access to the platform; this includes cloud providers and all supporting applications. Operators are required to use Azure Privileged Identity Management (PIM) - which provides just-in-time access - for any administrative activities. Azure AD Identity Protection identifies and blocks any risky Azure tenant sign-ins.
All single-tenant and multi-tenant Spaces are logically isolated from one another with no interconnectivity between them. This strong isolation drastically reduces an attacker's ability to pivot into other Spaces or PoPs and further their attack. We further restrict inbound connectivity to spaces by using AWS and Azure PrivateLink to connect outbound from Spaces to a PoP. All traffic is encrypted in transit using TLS 1.2+. This includes network traffic between all cloud services, Kubernetes pod-to-pod traffic, and Kubernetes cluster-to-cluster traffic. For some workloads, we utilize mutual TLS (mTLS). On the network edge, we protect against DDoS attacks on our infrastructure, identify and block bots, enforce rate limiting, and block malicious traffic with web application firewall (WAF) rules. Operators must use VPN to access platform infrastructure, adding another layer to our defense-in-depth strategy.
All data is encrypted at rest using AES-256 or better, and encryption occurs at multiple layers (e.g., disk, file, database, and application layers). All data in transit is encrypted using TLS 1.2+. Auth0 encrypts platform secrets using HashiCorp Vault and takes full advantage of Azure Key Vault and AWS KMS for encrypting data in cloud services.
Detection and Response
Auth0 monitors the platform using a combination of cloud services and our own monitoring tools. We take advantage of AWS GuardDuty and Microsoft Defender for Cloud in addition to our own tooling. Auth0 ingests a vast amount of AWS and Azure platform logs for all of the cloud services used by the platform. These logs are ingested into a security data lake and monitored with detections defined as code.
The 4C's of Cloud Native security is a good framework to use when thinking through Kubernetes security. We are, in varying aspects, responsible for security in the clouds we're using, the clusters we're running, the containers running on those clusters, and the code running within the containers. Auth0 uses the managed Kubernetes services Azure Kubernetes Service (AKS) and AWS' Elastic Kubernetes Service (EKS). These offerings from Azure and AWS are security feature-rich. Our worker nodes meet Kubernetes CIS Level 1 Benchmarks. Cluster access is restricted to operators, and least privilege is enforced using specific RBAC roles. We use Kubernetes NetworkPolicy objects to restrict pod network ingress and egress to authorized sources and destinations.
Our build and deploy pipelines continuously scan our cluster, containers, and code for vulnerabilities. Our security capabilities include the usual controls: static application security testing (SAST) on both code and infrastructure-as-code, dynamic application security testing (DAST) on our application and API endpoints, and software composition analysis (SCA) on our software dependencies and containers. Cloud Security Posture Management (CSPM) tooling identifies and responds to cloud misconfigurations (e.g., publicly accessible S3 buckets). Patching is a foundational security control; we continuously upgrade our containers and software dependencies. In the shared responsibility model, Azure/AWS are responsible for building upgraded node images, and Auth0 is responsible for deploying the images to our nodes. We accomplish this by rebuilding our Kubernetes nodes with the latest image during each Auth0 product release cycle.
Automation is a must
We are dealing with a scale that makes it impossible to configure all of these security capabilities manually. To address his challenge, the Auth0 Security team created an in-house service that automates AWS account and Azure subscription creation and configuration. Similar to a production line, when new Spaces and PoPs are needed, this service creates AWS accounts and Azure subscriptions to meet our strict security standards using predefined templates. It effectively "stamps out" completely configured new accounts/subscriptions on-demand. When the account or subscription is no longer needed, the service will remove all resources, revoke all access, and decommission it.
Continuously Improving Auth0's Security Posture
All of this work has culminated into a more secure, scalable platform that continues to meet or exceed the many requirements needed to achieve ISO 27001/27018, SOC 2, PCI, STAR, and HIPAA compliance.
The threat landscape is ever-changing and constantly evolving. The Auth0 Security team's job is never done, and a core principle is a continuous improvement. There are always ways to mitigate more risk and enhance our security posture, and we continually investigate additional security capabilities to make our platform and product even more secure for our customers.
We are publishing a series of technical posts on how and why we re-platformed. They'll be linked at the bottom of this post as they publish and are also findable on our blog. If you have technical questions or ideas for future posts, please reach out to us via the Auth0 Community. (Our marketing team also tells us we would be remiss if we didn't include a link to talk to Sales, so that is here, too.)
The Auth0 Identity Platform, a product unit within Okta, takes a modern approach to identity and enables organizations to provide secure access to any application, for any user. Auth0 is a highly customizable platform that is as simple as development teams want, and as flexible as they need. Safeguarding billions of login transactions each month, Auth0 delivers convenience, privacy, and security so customers can focus on innovation. For more information, visit https://auth0.com.