Events  Deals  Jobs 
    Sign in  
DataRobot // machine learning automation software for enterprise
Engineering, Full Time    New York City, NY, USA    Posted: Friday, March 26, 2021
Apply To Job

As a Senior Service Reliability Engineer, you will own and improve Service Reliability and Availability of this DataRobot's AI platform. You will be tasked to make DataRobots AI/ML platform more reliable, efficient, & scalable. You will play a key role in how the DataRobot tools & practices enable seamless scale while preventing failures. As an SRE, you will be part of the team that builds & enable the DevSecOps toolchain while continuously improving our ML/AI platform at scale. You will contribute to the full-service lifecycle: from service development to live service response, as we continuously deploy new & innovative functionality for our customers 



  • Must be familiar with AWS, GCP, & Azure architecture patterns & capabilities

  • Well versed in Software Defined Network definitions, capabilities, & limitations

  • Handle high-pressure situations in a calm & professional manner

  • Lead resolution effort of complex service problems from the network layer to the application at scale 

  • Motivate, encourage, & provide technical leadership to team members 

  • Work hand-in-hand with software developers to facilitate the adoption of "Paved Road" solutions 

  • Build & support large-scale services across multiple platforms (Azure, AWS, and GCP) 

  • Diagnose & repair issues by editing code in node.js, modifying MongoDB, Postgres, Redis, and configuration changes in cloud service providers 

  • Create, edit, & maintain ad hoc scripts to resolve issues quickly with minimal user impact 

  • Contribute to the development of new tools & automation that ensures the service can be optimized & tuned with minimal human intervention 

  • Support periodic on-call duty 


  • MongoDB, Mongo MMS, node.js/IIS on AWS/GCP/Azure 

  • Demonstrable experience in one or more languages: Python, Perl, PHP is a plus 

  • Strong knowledge of TCP/IP networking, SMTP, HTTP, load-balancers, highly available network servers 

  • GitHub/Artifactory/RabbitMQ, Application Performance Monitoring principles, CDN, DNS 

  • Knowledge of IP networking, network analysis, performance, & application issues using tools like fiddler and Wireshark 



  • A passion for automating everything

  • A passion for collaborating & tearing down communication silos

  • Experience maintaining large scale infrastructure, 100+ servers minimum

  • 5+ Years experience with AWS

  • 3+ Years experience with Terraform or CloudFormation

  • 5+ Years experience with Linux (Ubuntu, RedHat, or similar)


Bachelor's Degree in CS, MIS, or equivalent experience; 6+ years of relevant experience with Windows/Unix systems fundamentals, monitoring, cloud services, networking, storage, database, & application knowledge; Solid communications skills 


Individuals seeking employment at DataRobot are considered without regard to race, color, religion, national origin, age, sex, marital status, ancestry, physical or mental disability, veteran status, gender identity, or sexual orientation.


Apply To Job
© 2021 GarysGuide      About    Feedback    Press    Terms