As a Team Lead for the ML Ops team you will work closely with Core Infra, Machine Learning, SRE & other engineering teams to codify best practices, standardize processes, build tools & frameworks that improve developer's & modeler's workflows, & help manage our growing inventory of state-of-the-art machine learning models.
What is ML Ops?
ML Ops is a set of practices that combines Machine Learning & DevOps, which aims to deploy & maintain ML systems in production reliably & efficiently. Software engineers develop a set of tools or practices for versioning, validating (data & model), deploying, scaling, & monitoring Kensho's ML services. ML Ops includes managing the consumption & use of models in conjunction with the ML, API Services, SRE & Core Infra teams. ML Ops engineers may create tools to help prototype/build new models, monitor or retrain models (e.g. due to drift) over their lifespan, as well as decommission models that are obsolete or no longer meet performance requirements. ML Ops may help with creating multiple versions of models, trained or tuned for specific use cases or customers. MLOps may develop tools or processes to help end users discover ML.