Peloton is looking for a Site Reliability Engineer with a focus on Kubernetes operations to work with teams across the organization to help build & maintain a monitorable, performant, reliable & highly-scalable deployment platform. We are a growing team of engineers tackling challenging problems with scaling Kubernetes to handle thousands of nodes & pods spread across many deployments.
The Kubernetes working group at Peloton works closely with development teams to ensure that the platform is robust, stable, & delivers features that include the following:
- Automatic, fast autoscaling for live rides & special large events
- Hosting critical infrastructure that ensures that our members have the best experience possible on tens of thousands of pods across multiple clusters
- Provides a platform for machine learning (and other awesome workloads) so that we can be at the forefront of the industry
- Allows developers to move quickly & experiment, without getting in the way
What You'll Be Doing:
- Evangelize best practices for building & operating highly reliable systems
- Serve as subject matter expert in observability & monitoring
- Consult in system design to meet reliability & capacity requirements
- Automate everything, from infrastructure down to day-to-day tasks.
- Conduct timely post-mortems of infrastructure incidents
- Assist with all aspects of operational security & compliance
- Seek out potential threats to security & reliability & advocate solutions
- We work with Amazon Web Services, Chef, Python, Ubuntu, Nginx, Jenkins, & Terraform
What Were Looking For:
- Experience maintaining scalable & stable Kubernetes clusters.
- Knowledge of best practices when it comes to the observability & monitoring required of running Kubernetes at scale.
- Knowledge of best practices in regards to securing a Kubernetes cluster & its deployments at scale.
- A passion for helping development teams make the transition to a container-native world.
- Experience with CI/CD Systems such as for example: Jenkins, ArgoCD, Harness, Tekton, etc.
- Experience deployment infrastructure using Infrastructure as Code utilities such as Terraform or Pulumi.
- Know when to triage & when to dive down into a root-cause analysis.
- Passion for reliable, scalable, observable software with strong sense of ownership.
- Experience with a programming language like Python, Golang, Java, C.
Peloton is the largest interactive fitness platform in the world with a loyal community of more than 2.6 million Members. The company pioneered connected, technology-enabled fitness, & the streaming of immersive, instructor-led boutique classes for its Members anytime, anywhere. Peloton makes fitness entertaining, approachable, effective, & convenient, while fostering social connections that encourage its Members to be the best versions of themselves. An innovator at the nexus of fitness, technology, & media, Peloton has reinvented the fitness industry by developing a first-of-its-kind subscription platform that seamlessly combines the best equipment, proprietary networked software, & world-class streaming digital fitness & wellness content, creating a product that its Members love. The brand's immersive content is accessible through the Peloton Bike, Peloton Tread, & Peloton App, which allows access to a full slate of fitness classes across disciplines, on any iOS or Android device, Fire TV, Roku, Chromecast & Android TV. Founded in 2012 & headquartered in New York City, Peloton has a growing number of retail showrooms across the US, UK, Canada & Germany. For more information, visit www.onepeloton.com.