| |
|
|
DETAILS |
|
Welcome to another edition of South Bay Systems! This time, we'll have a double feature! First we'll have Songqiao Su & Raghav Yadav talking about optimizing Apache Pinot for real-time analytics, then we'll have Owen Xiao talking about variants & semi-structured data in Apache Doris.
Agenda
6:00 PM: Doors open, food & socializing
6:30 PM - 7:00 PM: Apache Pinot Talk
7:00 PM - 7:30 PM: Apache Doris Talk
7:30 PM onward : Community socializing!
Food & beverages will be provided, courtesy of our hosts, Adobe.
Low-Latency Serving on Cloud Object Stores with Apache Pinot
In this talk, we present the evolution of Apache Pinot's architecture: first from tightly coupled storage & compute, to decoupled cloud storage, & now toward native support for Parquet as a first-class segment format. We will discuss key technical innovations such as the implementation of a Parquet-compatible forward index reader, which enables all of Pinot's indexing strategies to operate directly on Parquet files. Additional optimizations include index pinning, Parquet page-level selective reads, page prefetching for efficient I/O parallelism, & page caching. Together, these enhancements allow Pinot's indexing & query execution framework to deliver sub-second performance directly on Parquet data, going far beyond conventional metadata-based pruning approaches.
Speaker Bio
Songqiao Su is a Staff Software Engineer at StarTree.AI, working on building tiered storage & improving compute-storage decoupling in Apache Pinot & StarTree Cloud. His work focuses on large-scale, high-performance distributed systems. Before joining StarTree, he worked on network & RPC infrastructure at Facebook & Databricks.
Raghav Yadav is a Staff Software Engineer at StarTree.AI, working on building a low-latency serving layer on Iceberg in Apache Pinot & StarTree Cloud. His expertise spans distributed databases & large-scale systems, with experience in cloud-scale data infrastructure at Microsoft Azure, real-time streaming databases as a founding engineer at Grainite, & now real-time OLAP analytics at StarTree.
The Evolution of Semi-Structured Data Analytics: From Text, JSON to VARIANT
Abstract
Semi-structured data, such as JSON, is gaining widespread adoption due to its flexibility. However, traditional databases & data warehouses are built for structured schemas, creating new challenges in storing & analyzing semi-structured formats. In this session, we'll explore:
Characteristics & challenges of semi-structured data
Limitations of traditional approaches
Apache Doris' native solution for semi-structured analytics
Comparison with Snowflake, Iceberg (VARIANT type), & Elasticsearch
Real-world applications in Log Analytics, Distributed Tracing, & IoT
Speaker Bio
Owen Xiao is a co-founder of VeloDB & a PMC member of Apache Doris, where he leads product strategy, observability, & AI-driven R&D for both open-source & enterprise data platforms. With over 10 years of experience in database kernel development & distributed systems architecture, he has helped scale analytical databases for global enterprises.
|
|
|
|
|
|
|
|