We are looking for a strong Data Scientist orMachine Learning Engineer (MLE) - a proven 'doer' to develop, implement & extend data-intensive ML software for real-time auctioning, ad inventory estimation, & audience segmentations.
You will design & implement core components of our algorithms, as well as model & monetize the terabytes of structured data that PubMatic generates daily.
Working with our Data Science & Ad Serving teams, you will apply ML to help get things done.
Development & implementation of data-intensive ML algorithms & software for real-time auctioning, ad inventory estimation, audience segmentations, & related AdTech applications.
Working with data scientists, product managers, & software engineers to develop & support the software for new ML products.
Ensuring excellence in delivery to internal & external customers
MS / PhD in STEM field
3+ years of hands-on industry work experience designing & building large-scale ML algorithms & ETL that are well-designed, cleanly coded, well-documented, operationally stable, & timely delivered
5+ years total analytical work, including academic research
Solid experience with:
Python or R, including ML libraries (SKLearn, NumPy, caret, e1071, ), including CPU/GPU parallelization, matrix algebra, vectorization, linear programming, lambda programming, OOP,
At least one of the DL frameworks (TensorFlow, PyTorch, Caffe, Theano, Keras, or alike)
Solid understanding of:
Graduate statistics & probability (inference, hypothesis testing, p-value, ANOVA, CLT, LLN, Bayes' theorem, A/B testing, combinatorics, PDF/CDF, joint/conditional/marginal densities)
Vector calculus (gradients, Jacobians, partial derivatives & integrals, optimization)
Linear algebra (eigen values/vectors, inverses, decompositions, orthogonality, multi-linear)
Time series (ARIMA, GARCH, forecasting, Kalman filter)
Shallow ML algorithms: regressions, SVM, kMeans, kNN, NB, HMM, PCA, NMF, SVD, XGBoost, decision trees, ensemble methods (random forest)
Deep NN algorithms: MLP, RNN, LSTM, CNN, GRU
ML concepts: backprop, hyperparameter tuning (Bayesian optimization, grid/random search), regularization, learning rate, optimization
Advanced work with SQL or NoSQL, including nested/join/aggregate queries, stored procedures, over partition by, basic stat functions
Cloud compute engines (AWS, Azure, GCP & alike), ML on clusters of GPUs, SageMaker, Jupyter
Excellent communication skills, cultural fit & natural curiosity in learning the ML developments & domain expertise
Nice to have:
Prior experience with programmatic advertising & RTB
Deep reinforcement learning (Bellman equations, MDP, policy optimization, credit assignment, multi-agent)
Proficiency with Spark (ML Lib, GraphX), Hadoop, Kafka, and/or Hive
Proficiency with Scala, Java, and/or C/C++
Record of STEM publications in top journals or conferences
High ranking at Kaggle competitions
What's the first step?
Please complete thisquick self-ranking of your strengths, & we can get you started!