| |
[ SF Tech Week ] |
Multimodal Data w/ Modern Tools
|
With Sammy Sidhu (Founder/CEO, Eventual), Chang She (Founder/CEO, LanceDB), Stu Stewart (Head of ML/AI Engg, Twelve Labs), Ramesh Chandra (Principal Enggr, Databricks), Paul George (ML Enggr, Twelve Labs). |
| Venue, To Be Announced, San Francisco |
|
Oct 08 (Tue) @ 05:00 PM
| |
FREE |
|
|
|
|
|
|
|
|
|
DETAILS |
|
Modern problems = Modern solutions!
Modern AI/ML workloads require data infrastructure that is capable of handling the complexity of multimodal data. AI/ML is no longer about simple tabular features or clickstream data - modern AI/ML demands data infrastructure that can handle messy unstructured text, documents, images & even video.
Meet the team behind the Daft project & a panel of experts at the forefront of these new technologies that make up the next generation of scalable AI/ML systems. We will be exploring new & exciting work from storage to data curation, large model training, & evaluation.
Thank you to our sponsor CRV for providing foods & drinks for this meetup!
This event is a part of #SFTechWeek - a week of events hosted by VCs & startups to bring together the tech ecosystem.
Agenda
5:00p - 5:45p: Doors Open & Networking
5:45p - 6:50p: Welcome Remarks & Presentations!
6:50p - 8:30p: More Networking
About Daft
Daft is an open source framework that powers ETL, analytics, & ML/AI at scale. Its familiar Dataframe API is built to outperform Spark in performance & ease of use.
Join Distributed Data Community Slack
Check out Daft Engineering Blog
Follow Daft on LinkedIn & Twitter
Subscribe to Daft YouTube
We're hiring, join our team
Presentations
Distributed Data Tools Should Be Easy To Use: Entreaties From an End-User
The Twelve Labs team has been hard at work scaling multimodal AI application inference, & will discuss their recent work prototyping Ray Serve as a foundation for such workloads! Ray is a great tool, but building applications on Ray Serve exposed a number of challenges that could have been avoided with a more streamlined local development experience. The Twelve Labs team will share tips & tricks for developing distributed AI applications locally, including how to overcome scale-down limitations in "exascale first" platforms (Ray, Spark, etc.). Data product folks, please take note :)
Stu (Michael) Stewart is the Head of ML/AI Engineering at Twelve Labs, where he & his team are building foundation models for video search & video guided language generation. Previously, he worked on autonomous vehicles at Cruise, & on home pricing algorithms at Opendoor. Twelve Labs is hiring! https://www.twelvelabs.io/careers
Paul George is a Senior Staff ML Engineer at Twelve Labs working on data & inference infrastructure. Previously, he worked on data infrastructure & pricing algorithms at Perpetua Labs & Opendoor before that.
Why Multimodal Data Requires New Tools
Existing query engines like Spark & Trino have historically excelled at processing analytical data at scale, however they are a poor choice for handling the complexities of unstructured or multimodal data.
In this talk, we will uncover some of the challenges these engines face when processing multimodal data & introduce how we designed Daft to solve many of these problems in a distributed fashion. Dive into the internals of Daft & its architecture, discover why we chose Rust to power our fast & distributed Python query engine, & unlock new workloads & possibilities for multimodal by leveraging Daft.
Sammy Sidhu is the co-founder & CEO of Eventual, the company behind Daft. Sammy's background is in High Performance Computing (HPC) & Deep Learning & has over a dozen patents/publications in the space. Prior to Eventual, Sammy worked in Autonomous driving for 6 years & sold a startup to Tesla Autopilot in the process.
A New Open Source Foundation for AI Data
The current data stack is built on top of foundations laid down a decade ago for tabular data. But AI datasets are much more complex & workloads are much more diverse. Enterprises scaling AI in production often find data management prohibitively expensive & overly complicated. Lance columnar format is an open-source project designed to provide the new data foundations for AI, delivering much better performance & scalability for AI datasets, & makes them natively searchable using vector or full text queries. In this talk we'll dive into the main challenges that AI data poses, how Lance format works, & the value it delivers to AI teams training models or putting applications into production.
Chang She is the CEO & cofounder of LanceDB, the developer-friendly, open-source database for multi-modal AI. A serial entrepreneur, Chang has been building DS/ML tooling for nearly two decades & is one of the original contributors to the pandas library. Prior to founding LanceDB, Chang was VP of Engineering at TubiTV, where he focused on personalized recommendations & ML experimentation.
Unity Catalog: the Universal Catalog for Data + AI
Come & discover how Unity Catalog provides a unified solution to manage your unstructured data, like images & documents, & GenAI tools with a single, universal catalog for data + AI. Unity Catalog is multimodal & provides interoperability across lakehouse formats like Delta & Iceberg, & various compute engines. It comes with built-in governance & security, including strong authentication, secure credential vending, & asset-level access control, to protect your data & AI assets. In this talk, we'll present an overview of Unity Catalog & showcase its multimodal capabilities.
Ramesh Chandra is a Principal Engineer at Databricks, building Unity Catalog & Governance. HIs background is in distributed systems, storage, & security. Previously, he was a tech lead for the Cloud AI platform & Cloud Identity teams at Google, & built distributed storage systems at Nutanix.
|
|
|
|
|
|
|
|