Staff Site Reliability Engineer

Location

Spain

Description

We are looking for a Tech Lead reporting to the SRE Manager to act as a major contributor in building a brand new top-tier SRE team. As a founding member you will shape the future of Auth0 by helping define what reliability and incident response means.
You may be a Team Lead, a Manager looking to return to hands-on work, or a Senior SRE/Platform/Software Engineer seeking the next step on the technical track and are excited by the opportunity to support the migration, availability, security, and releases of large scale, highly available, distributed systems running on multiple global cloud products.Skills
  • Participate in regular on-call rotations to ensure 24/7 coverage of all critical systems
  • Deep understanding of the Software Development Lifecycle (SDLC) and implementing observability as a first-class citizen
  • Experience designing and operating large scale distributed systems (preference given to those with experience in multi-region, multi tenant microservice systems)
  • Strong understanding of the Agile software development methodology, including leading sprints or previous experience as a Scrum Master or Coach
  • Team members naturally gravitate to you for leadership and mentorship

Responsibilities

  • Own the day-to-day operations of the Auth0 SRE Team
  • Collaborate with other Engineering teams to support services before they go live through activities such as system design consulting, developing software platforms and frameworks, monitoring/alerting, capacity planning and launch reviews
  • Implement automation and champion the evolution of systems by facilitating changes that improve reliability and velocity
  • Provide technical leadership to a growing team focused on applying reliability engineering practices to mission-critical operations at scale
  • Triaging and troubleshooting complex production issues to ensure reliability and performance
  • Developing and maintaining technical documentation, runbooks, and procedures

Experience

  • 2+ years on an SRE team
  • 2+ years team leadership experience or 5+ years experience as a senior technical resource
  • 5+ years of professional software development experience (golang, python)
  • 5+ years of professional experience in cloud infrastructure automation/orchestration (Terraform, Kubernetes)
  • 5+ years in a production environment supporting mission-critical applications
  • Experience with observability practices such as time series, tracing, logging and alerting (Datadog, Kibana, Lightstep, PagerDuty, CloudWatch)
  • Experience defining and implementing incident response processes
  • Experience with Jira, GitHub and similar development tools

Okta is an Equal Opportunity Employer.

Okta is rethinking the traditional work environment, providing our employees with the flexibility to be their most creative and successful versions of themselves, no matter where they are located. We enable a flexible approach to work, meaning for roles where it makes sense, you can work from the office, or from home, regardless of where you live. Okta invests in the best technologies and provides flexible benefits and collaborative work environments/experiences, empowering employees to work productively in a setting that best and uniquely suits their needs. Find your place at Okta https://www.okta.com/company/careers/.

By submitting an application, you agree to the retention of your personal data for consideration for a future position at Okta. More details about Okta’s privacy practices can be found at: https://www.okta.com/privacy-policy.

#LI-Remote
#LI-EZ1

Share this position

Apply now