We are looking for an experienced Site Reliability Engineer to join our Technical Operations team. At Okta, we are "Always On." The core of that starts with this team, ensuring that customers never worry about the Okta service. They strive to build the most reliable & performant systems on the planet.
As a member of the FAST team at Okta, you'll be at the center of our commitment to Always On. Our responsibilities span a number of Okta's most crucial services, & we have significant ownership of our customer-facing infrastructure. We're a collaborative, supportive, & highly skilled team of engineers who take our role seriously, & craft tooling & playbooks to meet Okta's legendary reliability.
What You'll Do:
Be a collaborative member of a team that is responsible for Okta's production infrastructure, with a focus on scaling our impact & lowering our operational overhead.
- Promote & apply best practices for building scalable & reliable tooling across engineering.
- Be a subject matter expert & partner with our team at Amazon Web Services (AWS).
- Designing, building, running, & monitoring Okta's production infrastructure.
- Driving initiatives to evolve our current platform to increase efficiency & keep it in line with current standards & best practices.
- Responding to production incidents & determining how we can prevent them in the future.
- Identifying & automating manual processes
- Support a 24x7 online environment as part of an on-call rotation.
- Develop & maintain technical documentation, runbooks, & procedures.
Qualifications for the role:
- 3+ years of experience managing large-scale AWS deployments.
- Familiarity with running large codebases in a containerized environment, & the tradeoffs & benefits of such.
- Real-world experience running a modern web stack in production, including HTTP tiers such as haproxy, nginx, or Envoy, application tiers such as Tomcat or Jetty, & NoSQL data or cache tiers such as Elasticsearch, Redis.
- Knowledge & experience with persistent data stores likely to be utilized by a large web application, both SQL & NoSQL.
- Demonstrate excellent Linux fundamentals.
- Have exposure to FedRAMP, SOC2, or other compliance programs.
- 3+ years of experience with automating systems & infrastructure via Ansible, Chef, or Terraform.
- Have experience automating & running large-scale production services in AWS or other cloud providers.
- Can code to a good standard with any programming language, but especially Ruby, Python, or Go, using source control & Agile methodologies.
- Champion excellent written & oral communication skills, with the ability to influence others.
Education & Training:
- BS. Computer Science (plus) or relevant experience
Okta is an Equal Opportunity Employer.
Okta is rethinking the traditional work environment, providing our employees with the flexibility to be their most creative & successful versions of themselves, no matter where they are located. We enable a flexible approach to work, meaning for roles where it makes sense, you can work from the office, or from home, regardless of where you live. Okta invests in the best technologies & provides flexible benefits & collaborative work environments/experiences, empowering employees to work productively in a setting that best & uniquely suits their needs. Find your place at Okta https://www.okta.com/company/careers/.
By submitting an application, you agree to the retention of your personal data for consideration for a future position at Okta. More details about Oktas privacy practices can be found at: https://www.okta.com/privacy-policy.