About Clarifai
Clarifai is a leading, full-lifecycle deep learning AI platform for computer vision, natural language processing, & audio recognition. We help organizations transform unstructured images, video, text, & audio data into structured data at a significantly faster & more accurate rate than humans would be able to do on their own. Founded in 2013 by Matt Zeiler, Ph.D. Clarifai has been a market leader in AI since winning the top five places in image classification at the 2013 ImageNet Challenge. Clarifai continues to grow with employees remotely based throughout the United States & in Tallinn, Estonia.
We have raised $100M in funding to date, with $60M coming from our most recent Series C, & are backed by industry leaders like Menlo Ventures, Union Square Ventures, Lux Capital, New Enterprise Associates, LDV Capital, Corazon Capital, Google Ventures, NVIDIA, Qualcomm & Osage.
Clarifai is proud to be an equal opportunity workplace dedicated to pursuing, hiring, & retaining a diverse workforce.
Impact
We believe that world-class AI is built on a foundation of world-class data. The AI Data Lead for will own the critical, end-to-end process of creating & curating the high-quality datasets that fuel our models. You will be a power user of Clarifai's suite of automated data labeling products, providing direct feedback to our product & engineering teams to drive continuous improvement.
Initially, this role will concentrate on building our next-generation vision datasets, with a heavy emphasis on full-motion video. Over time, the scope will strategically expand to include the development of our large-scale language datasets for advanced NLP models.
Opportunity
- Dataset Strategy & Pipeline Development:
- Collaborate with ML & product teams to define data requirements, starting with complex video & image use cases & expanding into text & language.
- Design & execute a comprehensive strategy for data acquisition & augmentation.
- Build, scale, & maintain robust data pipelines to ingest, process, & version large-scale multimedia datasets.
- Third-Party Labeling & Internal Tool Management (Primary Focus):
- Leverage Clarifai's automated & AI-assisted labeling tools to efficiently pre-label data & manage human-in-the-loop workflows.
- Serve as the primary lead for external data labeling vendors who will often verify or enrich AI-generated labels, ensuring projects are on time & within budget.
- Author crystal-clear labeling instructions for complex tasks, from object tracking in video to, eventually, named entity recognition in text.
- Implement & manage a rigorous quality assurance (QA) framework for both AI- & human-generated labels.
- Product Feedback & Improvement Loop:
- Act as a key internal customer for Clarifai's data labeling products.
- Provide structured, expert feedback to our product & engineering teams to identify bugs, suggest feature enhancements, & guide the product roadmap.
- Continuously evaluate & pioneer new strategies for combining automated labeling with human verification to maximize quality & efficiency.
- Leadership & Collaboration:
- Lead & mentor a focused set of data labeling partners.
- Foster a culture of data excellence, ownership, & continuous improvement.
- Communicate project status, challenges, & outcomes effectively to all stakeholders. Keep track of budgets.
Requirements
- 3+ years in data engineering, with a proven history of building & managing complex data pipelines.
- Direct, hands-on experience managing third-party data labeling services or in-house annotation teams.
- Experience working with large-scale vision datasets (image or video).
- Deep understanding of data labeling processes & quality metrics.
- Strong proficiency in Python & SQL.
- Experience with cloud data services (AWS, GCP, or Azure).
- Exceptional project management, communication, & vendor management skills.
- A meticulous eye for detail & an unwavering commitment to data quality.
Great to Have
- Specific experience with the complexities of full-motion video datasets & annotation (e.g., temporal consistency, event tagging).
- Experience in an environment where you regularly used internal tools & provided feedback for their improvement ("dogfooding").
- Experience with large-scale language or text datasets.
- Previous experience in a technical leadership or mentorship role.
- Experience using a variety of data annotation platforms & tools.
|