AI Data Lead at Clarifai In Other

AI Data Lead
Clarifai // advanced image recognition using neural networks

About Clarifai

Clarifai is a leading, full-lifecycle deep learning AI platform for computer vision, natural language processing, & audio recognition. We help organizations transform unstructured images, video, text, & audio data into structured data at a significantly faster & more accurate rate than humans would be able to do on their own. Founded in 2013 by Matt Zeiler, Ph.D. Clarifai has been a market leader in AI since winning the top five places in image classification at the 2013 ImageNet Challenge. Clarifai continues to grow with employees remotely based throughout the United States & in Tallinn, Estonia.

We have raised $100M in funding to date, with $60M coming from our most recent Series C, & are backed by industry leaders like Menlo Ventures, Union Square Ventures, Lux Capital, New Enterprise Associates, LDV Capital, Corazon Capital, Google Ventures, NVIDIA, Qualcomm & Osage.

Clarifai is proud to be an equal opportunity workplace dedicated to pursuing, hiring, & retaining a diverse workforce.

Impact

We believe that world-class AI is built on a foundation of world-class data. The AI Data Lead for will own the critical, end-to-end process of creating & curating the high-quality datasets that fuel our models. You will be a power user of Clarifai's suite of automated data labeling products, providing direct feedback to our product & engineering teams to drive continuous improvement.

Initially, this role will concentrate on building our next-generation vision datasets, with a heavy emphasis on full-motion video. Over time, the scope will strategically expand to include the development of our large-scale language datasets for advanced NLP models.

Opportunity

Dataset Strategy & Pipeline Development:

Collaborate with ML & product teams to define data requirements, starting with complex video & image use cases & expanding into text & language.
Design & execute a comprehensive strategy for data acquisition & augmentation.
Build, scale, & maintain robust data pipelines to ingest, process, & version large-scale multimedia datasets.

Third-Party Labeling & Internal Tool Management (Primary Focus):

Leverage Clarifai's automated & AI-assisted labeling tools to efficiently pre-label data & manage human-in-the-loop workflows.
Serve as the primary lead for external data labeling vendors who will often verify or enrich AI-generated labels, ensuring projects are on time & within budget.
Author crystal-clear labeling instructions for complex tasks, from object tracking in video to, eventually, named entity recognition in text.
Implement & manage a rigorous quality assurance (QA) framework for both AI- & human-generated labels.

Product Feedback & Improvement Loop:

Act as a key internal customer for Clarifai's data labeling products.
Provide structured, expert feedback to our product & engineering teams to identify bugs, suggest feature enhancements, & guide the product roadmap.
Continuously evaluate & pioneer new strategies for combining automated labeling with human verification to maximize quality & efficiency.

Leadership & Collaboration:

Lead & mentor a focused set of data labeling partners.
Foster a culture of data excellence, ownership, & continuous improvement.
Communicate project status, challenges, & outcomes effectively to all stakeholders. Keep track of budgets.

Requirements

3+ years in data engineering, with a proven history of building & managing complex data pipelines.
Direct, hands-on experience managing third-party data labeling services or in-house annotation teams.
Experience working with large-scale vision datasets (image or video).
Deep understanding of data labeling processes & quality metrics.
Strong proficiency in Python & SQL.
Experience with cloud data services (AWS, GCP, or Azure).
Exceptional project management, communication, & vendor management skills.
A meticulous eye for detail & an unwavering commitment to data quality.

Great to Have

Specific experience with the complexities of full-motion video datasets & annotation (e.g., temporal consistency, event tagging).
Experience in an environment where you regularly used internal tools & provided feedback for their improvement ("dogfooding").
Experience with large-scale language or text datasets.
Previous experience in a technical leadership or mentorship role.
Experience using a variety of data annotation platforms & tools.