Resiliency, reliability, and adaptability are foundational tenets to how we build at Auth0. Last year, we recognized the need to deliberately double down on our resilience efforts in order to address some of the failures we had seen. In November 2021, I wrote about our plans to improve the resilience of our platform and the additional steps we were taking to address the peak holiday traffic. I’m proud of the hard work and diligence by our teams during the holiday season that moved us considerably towards our goal of providing a Tier 0 service. Today, I want to provide updates to that work, share the progress to date, as well as provide insight into our upcoming efforts.
Update on Our Resiliency Focus Areas to Date
As a reminder, a big focus area for our team was to reduce the load on our datastores. We made significant progress in this area:
- Reduced the database read/write workload by 50%. Our initial target was a 10% reduction, so we greatly surpassed our initial estimates
- Reduced 45% of inserts/deletes from our database in US-1, 50% in EU, and 20% in the rest of our environments. The team's initial target was 40% in US-1 so we also beat this estimate
We also actioned all the initially identified resiliency priorities. These were all implemented either prior to the Thanksgiving holiday in the US or in early December:
- Added noisy neighbor protection for our extensibility stack (Webtask)
- Expanded our caching approach for common Authentication API requests
- Enforced new Auth API global rate limits for all production tenants
- Enforced 100 RPS for non-production enterprise tenants
- Refactored our connections schema and deleted unused entries/rows; in US-1, we reduced the table size by ~25X.
- Overprovisioning of our stack was done in anticipation of holiday traffic, we’ve since downsized based on usage data and some of the other protections put into place
While we made great progress on our improvement initiatives, I want to acknowledge that in December we experienced a few issues related to our network edge, which caused unfortunate customer impact. We have met with our network edge provider to ensure our expectations around incident response and service levels are understood and upheld. We’ve also initiated a new internal effort to review our Edge resiliency plan and kicked off the discovery work. We’ll provide further updates as we make meaningful progress on this effort.
What’s Next in 2022
Resiliency has been and remains a top priority for Auth0 in 2022. The work that I outlined above has laid strong foundations that will serve as a springboard for our ongoing priorities. Below are a few key highlights of what to expect this year.
This is one of our top priorities for the first half of 2022. Construction efforts have kicked off for three new production environments that will enable us to shift load as required for our demand response strategies and limit the blast radius of disruptions and outages. These environments will be launched in 1H 2022 and will feature two new options in the US and one in the EU.
Automated Tenant Migration Process
In addition to new environments, our team has been working on our capabilities to move tenants from one environment to another in order to further reduce the blast radius. The early design work began in Q3 2021, with a focus on high-touch migration of select targeted customers based on feature usage, workload, and configuration. As we iterate and learn from that effort, we will shape up our tenant migration process from “high-touch”/manual intensive to a more automated process.
New Platform Rollout
We are laser-focused on our platform modernization efforts which have an underlying architecture that is responsive and resilient to meet all current and anticipated customer needs. The new platform is already powering our product that launched on MS Azure in November 2021. I will share details about the availability of this new platform on AWS as we get closer to launch in Q1 2022.
Thank you for the continued trust that you place in Auth0 to meet your identity needs. We have exciting things in the works that will enable us to continue our delivery of a Tier 0 service with all of the product innovation you have come to rely on for your applications.