DataRobot manages a variety of deployments for our cutting-edge AutoML, Time Series, & MLOps products. While we have several multi-tenant SaaS production environments in AWS, we also ship regular enterprise software releases for the diverse environments of our on-prem customers. You will play a key role in how the DataRobot tools & practices enable seamless scale while preventing failures with world-class observability. The Code & Architecture team is looking for an Infrastructure DevOps Engineer to help us to build a world-class observability framework for multi-cloud complex environments. You'll be working in close collaboration with engineering technical leadership to develop best monitoring & scalability tooling. We value engineers who are experts with DevOps tools & practices, who know how to build scalable & highly available infrastructure, & who are eager to chase challenges no matter where they lead. We will be excited to share our unique culture in a fast-moving startup environment.
- Adoption of the multi-account cross-region AWS infrastructure
- Develop & improve instrumentation for monitoring & logging the health & availability of services.
- Infrastructure & configuration management as a code
- Improve operational efficiencies via scripting, bots & integrations.
- Automation & maintenance of the existing infrastructure.
- Motivate, encourage, & provide technical leadership to team members .
- 3+ Years experience with AWS (multi-account, cross-region)
- 3+ Years experience with Docker & container orchestration (Kubernetes, Mesos, etc)
- A passion for DevOps methodology & automatization
- Experience maintaining large scale & geo-distributed infrastructure, 1k+ servers
- Expertise in running complex monitoring & logging systems (Prometheus / Grafana; ELK, etc)
- 3+ Years of Unix systems administration
- 3+ Years experience with Terraform/CloudFormation or Ansible
- Solid experience in automating with Python, Go
- Understanding of SLI/SLO fundamentals
- A passion for collaborating & tearing down communication silos
- Experience being technical lead