Events  Deals  Jobs 
    Sign in  
 
 
DataRobot // machine learning automation software for enterprise
 
Engineering, Full Time    Chicago, IL, USA    Posted: Saturday, June 12, 2021
 
   
 
Apply To Job
 
 
JOB DETAILS
 

Job Summary
DataRobot manages a variety of deployments for our cutting-edge AutoML, Time Series, & MLOps products. While we have several multi-tenant SaaS production environments in AWS, we also ship regular enterprise software releases for the diverse environments of our on-prem customers. You will play a key role in how the DataRobot tools & practices enable seamless scale while preventing failures with world-class observability. The Code & Architecture team is looking for an Infrastructure DevOps Engineer to help us to build a world-class observability framework for multi-cloud complex environments. You'll be working in close collaboration with engineering technical leadership to develop best monitoring & scalability tooling. We value engineers who are experts with DevOps tools & practices, who know how to build scalable & highly available infrastructure, & who are eager to chase challenges no matter where they lead. We will be excited to share our unique culture in a fast-moving startup environment.

Responsibilities

  • Adoption of the multi-account cross-region AWS infrastructure
  • Develop & improve instrumentation for monitoring & logging the health & availability of services.
  • Infrastructure & configuration management as a code
  • Improve operational efficiencies via scripting, bots & integrations.
  • Automation & maintenance of the existing infrastructure.
  • Motivate, encourage, & provide technical leadership to team members .

Main Requirements

  • 3+ Years experience with AWS (multi-account, cross-region)
  • 3+ Years experience with Docker & container orchestration (Kubernetes, Mesos, etc)
  • A passion for DevOps methodology & automatization
  • Experience maintaining large scale & geo-distributed infrastructure, 1k+ servers
  • Expertise in running complex monitoring & logging systems (Prometheus / Grafana; ELK, etc)

Desired Skills

  • 3+ Years of Unix systems administration
  • 3+ Years experience with Terraform/CloudFormation or Ansible
  • Solid experience in automating with Python, Go
  • Understanding of SLI/SLO fundamentals
  • A passion for collaborating & tearing down communication silos
  • Experience being technical lead
 
 
 
Apply To Job
 
 
 
 
 
© 2021 GarysGuide      About    Feedback    Press    Terms