NLP Data Scientist - Summary/ Search/NER
Manan AI was founded by Ph.D. CEO and consists of AI and digital media professionals with Stanford, Facebook, Google background, advised by the ex-partner of top-tear Silicon Valley VC. We build a strategic framework for the impact of disruption content, empower creators with applications of synthetic media AI-generators. All video and voice creation will be done via generative methods as decentralized Bloomberg & Hollywood on your mobile and laptop and can be used to make your creative content or interactive games, videos, virtual personas, movies, and immersive experiences. We imply cutting-edge ML/AI for S2T NLP, summarization, semantic search, speech recognition, language translation, NER, and synthetic media generation. A collaborative fiction with GPT2-3 (Open AI), Facebook AI to explore our common creator's humanity. Target: B2B, B2C, Entertainment, Marketing & Customer Service, Advertising, Security, and Privacy for Deep Fake.
We are a team of amazing millennial entrepreneurs, developers, and AI scientists that are working on solving the task of AI x Search x Editing x Share = Video, voices, and thoughts generated in collaboration with AI. We are looking for a full-time NLP Data Scientist who has 1-2 years of AI-experience
We work with data from a wide variety of sources including text, voice, news feeds, twits & stories, user behavior, and real-time data. The team has several US professors as senior advisors with world-class expertise in machine learning, statistics, optimization, and stochastic control who provide advice and mentorship for all members of the distributed team, research and development experience in real-world NLU/P, conversational AI, chatbots, dialog management, search, or voice/dialog systems. Your contributions will drive content discovery and personalization through voice/video/search/summary interactions across mobile apps, third-party devices (e.g. Alexa, Google Home, Roku, etc.), and automotive products.
Why Join Us:
We are a fully distributed team with a New York HQ. Flexible work with flexible schedule possible
You get to work on turning bleeding-edge research for generating voice and videos, deep fakes into commercial products
As part of the Search, and Voice Science team, you will design and build the next generation of voice/text/video and search experiences. As a Scientist, you will be an expert in areas spanning speech recognition, natural language processing and understanding, dialog management, personalization, natural language generation, and information retrieval
Open to candidates internationally, No-micromanagement environment for highly self-sufficient individuals
Powerful workstation with GPUs
Responsibilities and what we are looking for:
Developing cutting-edge ML for automatic text summarization/keyword search, semantic search voice/speech recognition/language translation/text generation/natural language processing in NLP
Design criteria for text/voice performance evaluation and enhance existing methodologies
Research, design, experiment with and build ML systems, particularly related to text/voice and search products.
Prototype New Features. This means rapidly building prototypes end-to-end, including storage, business logic, and user experience.
R&D in text summarization/semantic search/NER. Read, understand and implement research papers. Assemble prototypes and MVP. Compress models and optimize inference
Initial work could be done remotely with daily Zoom standups with full team and in-person meetings
Preferably you would be located and work in our New York, NY office
Advanced STEM degree: M.S. or Ph.D. with extensive relevant AI/NLP experience (Computer Science, Math, Statistics, Physics, Economics, Computational Linguistics, Neuroscience, engineering or related field)
Extensive experience utilizing deep learning & NLP methodologies, building data pipelines, exploratory data analysis, and other aspects of the data science process
Experience with cutting edge NLP techniques - BERT, XLM, XLnet (e.g., word2vec, RNNs, transformers). Experience with libraries ML-frameworks (e.g., TensorFlow, Keras, PyTorch, CUDA TensorFlow Serving, Vowpal Wabbit, sci-kit-learn)
Familiarity with tools such as Python, R, Julia or MATLAB - Familiarity with AWS or another cloud infrastructure provider (GCP, Azure, etc), Technologies: Kafka, Airflow, Composer
Production experience implementing machine learning pipelines and models at scale in Python, Java, Scala, or similar languages
Proficiency with distributed processing and warehousing frameworks (e.g., Spark, Hadoop, Hive, Tez, etc.). Experience with the research and development workflow/life-cycle for large-scale batch and streaming machine learning systems
Excellent written and verbal communication skills, ability to collaborate effectively with non-tech team members and stakeholders Self-motivated, growth-oriented, and driven to pursue solutions to challenging problems
A big "Plus" would experience working in the advertising or media industry
You are located anywhere. You speak and write English (B2+) is a must
Our Tech Stack:
PyTorch and Tensorflow wrapped in Flask and running in a Kubernetes cluster
Flutter, Node.js with TypeScript running on Firebase Functions and Google Cloud Storage
Great Libraries and Frameworks: NLTK, PyBrain, Caffe, NumPy, SciPy, Pandas, Matplotlib, Keras