Apache Pinot On Object Storage - Variants In Apache Doris | SF Tech Events - GarysGuide

COMING UP

SF Open Source AI Week
(Oct 18 - Oct 26)

TechCrunch Disrupt SF
(Oct 25 - Oct 31)

Apache Pinot On Object Storage - Variants In Apache Doris

With Owen Xiao (Founder, VeloDB), Songqiao Su (S/w Enggr, StarTree AI).

	Adobe Founders Tower, 333 W San Fernando St, San Jose
	Oct 27 (Mon) , 2025 @ 06:00 PM
	FREE

DETAILS

Welcome to another edition of South Bay Systems! This time, we'll have a double feature! First we'll have Songqiao Su & Raghav Yadav talking about optimizing Apache Pinot for real-time analytics, then we'll have Owen Xiao talking about variants & semi-structured data in Apache Doris.

Agenda
6:00 PM: Doors open, food & socializing

6:30 PM - 7:00 PM: Apache Pinot Talk

7:00 PM - 7:30 PM: Apache Doris Talk

7:30 PM onward : Community socializing!

Food & beverages will be provided, courtesy of our hosts, Adobe.

Low-Latency Serving on Cloud Object Stores with Apache Pinot
In this talk, we present the evolution of Apache Pinot's architecture: first from tightly coupled storage & compute, to decoupled cloud storage, & now toward native support for Parquet as a first-class segment format. We will discuss key technical innovations such as the implementation of a Parquet-compatible forward index reader, which enables all of Pinot's indexing strategies to operate directly on Parquet files. Additional optimizations include index pinning, Parquet page-level selective reads, page prefetching for efficient I/O parallelism, & page caching. Together, these enhancements allow Pinot's indexing & query execution framework to deliver sub-second performance directly on Parquet data, going far beyond conventional metadata-based pruning approaches.

Speaker Bio
Songqiao Su is a Staff Software Engineer at StarTree.AI, working on building tiered storage & improving compute-storage decoupling in Apache Pinot & StarTree Cloud. His work focuses on large-scale, high-performance distributed systems. Before joining StarTree, he worked on network & RPC infrastructure at Facebook & Databricks.

Raghav Yadav is a Staff Software Engineer at StarTree.AI, working on building a low-latency serving layer on Iceberg in Apache Pinot & StarTree Cloud. His expertise spans distributed databases & large-scale systems, with experience in cloud-scale data infrastructure at Microsoft Azure, real-time streaming databases as a founding engineer at Grainite, & now real-time OLAP analytics at StarTree.

The Evolution of Semi-Structured Data Analytics: From Text, JSON to VARIANT
Abstract
Semi-structured data, such as JSON, is gaining widespread adoption due to its flexibility. However, traditional databases & data warehouses are built for structured schemas, creating new challenges in storing & analyzing semi-structured formats. In this session, we'll explore:

Characteristics & challenges of semi-structured data

Limitations of traditional approaches

Apache Doris' native solution for semi-structured analytics

Comparison with Snowflake, Iceberg (VARIANT type), & Elasticsearch

Real-world applications in Log Analytics, Distributed Tracing, & IoT

Speaker Bio
Owen Xiao is a co-founder of VeloDB & a PMC member of Apache Doris, where he leads product strategy, observability, & AI-driven R&D for both open-source & enterprise data platforms. With over 10 years of experience in database kernel development & distributed systems architecture, he has helped scale analytical databases for global enterprises.