Multimodal Data w/ Modern Tools | SF Tech Events - GarysGuide

COMING UP

NY Space Week
(Apr 28 - May 02)

RSA Conference
(Apr 28 - May 01)

Farcon NYC
(Apr 30 - May 04)

NY Tech Week
(May 31 - Jun 08)

[ SF Tech Week ]

Multimodal Data w/ Modern Tools Popular Event

With Sammy Sidhu (Founder/CEO, Eventual), Chang She (Founder/CEO, LanceDB), Stu Stewart (Head of ML/AI Engg, Twelve Labs), Ramesh Chandra (Principal Enggr, Databricks), Paul George (ML Enggr, Twelve Labs).

	Venue, To Be Announced, San Francisco
	Oct 08 (Tue) @ 05:00 PM
	FREE

DETAILS

Modern problems = Modern solutions!
Modern AI/ML workloads require data infrastructure that is capable of handling the complexity of multimodal data. AI/ML is no longer about simple tabular features or clickstream data - modern AI/ML demands data infrastructure that can handle messy unstructured text, documents, images & even video.

Meet the team behind the Daft project & a panel of experts at the forefront of these new technologies that make up the next generation of scalable AI/ML systems. We will be exploring new & exciting work from storage to data curation, large model training, & evaluation.

Thank you to our sponsor CRV for providing foods & drinks for this meetup!

This event is a part of #SFTechWeek - a week of events hosted by VCs & startups to bring together the tech ecosystem.

Agenda
5:00p - 5:45p: Doors Open & Networking

5:45p - 6:50p: Welcome Remarks & Presentations!

6:50p - 8:30p: More Networking

About Daft
Daft is an open source framework that powers ETL, analytics, & ML/AI at scale. Its familiar Dataframe API is built to outperform Spark in performance & ease of use.

Join Distributed Data Community Slack

Check out Daft Engineering Blog

Follow Daft on LinkedIn & Twitter

Subscribe to Daft YouTube

We're hiring, join our team

Presentations
Distributed Data Tools Should Be Easy To Use: Entreaties From an End-User

The Twelve Labs team has been hard at work scaling multimodal AI application inference, & will discuss their recent work prototyping Ray Serve as a foundation for such workloads! Ray is a great tool, but building applications on Ray Serve exposed a number of challenges that could have been avoided with a more streamlined local development experience. The Twelve Labs team will share tips & tricks for developing distributed AI applications locally, including how to overcome scale-down limitations in "exascale first" platforms (Ray, Spark, etc.). Data product folks, please take note :)

Stu (Michael) Stewart is the Head of ML/AI Engineering at Twelve Labs, where he & his team are building foundation models for video search & video guided language generation. Previously, he worked on autonomous vehicles at Cruise, & on home pricing algorithms at Opendoor. Twelve Labs is hiring! https://www.twelvelabs.io/careers

Paul George is a Senior Staff ML Engineer at Twelve Labs working on data & inference infrastructure. Previously, he worked on data infrastructure & pricing algorithms at Perpetua Labs & Opendoor before that.

Why Multimodal Data Requires New Tools

Existing query engines like Spark & Trino have historically excelled at processing analytical data at scale, however they are a poor choice for handling the complexities of unstructured or multimodal data.

In this talk, we will uncover some of the challenges these engines face when processing multimodal data & introduce how we designed Daft to solve many of these problems in a distributed fashion. Dive into the internals of Daft & its architecture, discover why we chose Rust to power our fast & distributed Python query engine, & unlock new workloads & possibilities for multimodal by leveraging Daft.

Sammy Sidhu is the co-founder & CEO of Eventual, the company behind Daft. Sammy's background is in High Performance Computing (HPC) & Deep Learning & has over a dozen patents/publications in the space. Prior to Eventual, Sammy worked in Autonomous driving for 6 years & sold a startup to Tesla Autopilot in the process.

A New Open Source Foundation for AI Data

The current data stack is built on top of foundations laid down a decade ago for tabular data. But AI datasets are much more complex & workloads are much more diverse. Enterprises scaling AI in production often find data management prohibitively expensive & overly complicated. Lance columnar format is an open-source project designed to provide the new data foundations for AI, delivering much better performance & scalability for AI datasets, & makes them natively searchable using vector or full text queries. In this talk we'll dive into the main challenges that AI data poses, how Lance format works, & the value it delivers to AI teams training models or putting applications into production.

Chang She is the CEO & cofounder of LanceDB, the developer-friendly, open-source database for multi-modal AI. A serial entrepreneur, Chang has been building DS/ML tooling for nearly two decades & is one of the original contributors to the pandas library. Prior to founding LanceDB, Chang was VP of Engineering at TubiTV, where he focused on personalized recommendations & ML experimentation.

Unity Catalog: the Universal Catalog for Data + AI

Come & discover how Unity Catalog provides a unified solution to manage your unstructured data, like images & documents, & GenAI tools with a single, universal catalog for data + AI. Unity Catalog is multimodal & provides interoperability across lakehouse formats like Delta & Iceberg, & various compute engines. It comes with built-in governance & security, including strong authentication, secure credential vending, & asset-level access control, to protect your data & AI assets. In this talk, we'll present an overview of Unity Catalog & showcase its multimodal capabilities.

Ramesh Chandra is a Principal Engineer at Databricks, building Unity Catalog & Governance. HIs background is in distributed systems, storage, & security. Previously, he was a tech lead for the Cloud AI platform & Cloud Identity teams at Google, & built distributed storage systems at Nutanix.