SF Tech Events - GarysGuide | The #1 Resource for SF Tech

LOCATION

EVENT DETAILS

Important Note: It is required to register for the event (free) on ti.to, before the event. You will then be sent an eNDA which needs to be signed 24 hours before the event, for security reasons. A badge would be pre-printed for you when you arrive at the event. Please register here (https://ti.to/big-data/data-infrastructure/with/dbud9l-da7a). If for some reason you are not able to sign the eNDA online, you can still attend, however you may have a wait in a long line at the sign in desk.

Talk #1: Introducing Iceberg, Tables designed for object stores

This talk will focus on Iceberg, a new table metadata format that is designed for managing huge tables backed by S3 storage. Iceberg decreases job planning time from minutes to under a second, while also isolating reads from writes & fixing longstanding problems like reliable schema evolution. This talk will include an overview of how Iceberg works & details about how Netflix is using Iceberg to make big data easier & more reliable.

Speaker Bio:
Ryan Blue works on Netflix's big data platform team. He contributes to Apache Spark & is a PMC member of Apache Parquet & Apache Avro.

Talk #2: Scaling Apache Spark Usage at Lyft

In this talk, Li will talk about current Apache Spark usages at Lyft & how Lyft scales current usage of Apache Spark for machine learning & etl-type of workloads through managed multi-cluster model. In this talk we will also show how we operate Apache Spark with autoscaling & high availability support. In this talk we will also show how Spark coexists with our Apache Hive & other data infrastructure services as a portfolio offered to a wide range of customers.

Speaker Bio:
Li Gao is the tech lead in the Apache Spark domain in Data Infrastructure org at Lyft. Prior to Lyft, Li worked at Salesforce, Fitbit, Marin Software, & a few startups etc. on various technical leadership positions on cloud native & hybrid cloud data platforms at scale. Besides Spark, Li has scaled & productionized other open source projects, such as Presto, Apache HBase, Apache Phoenix, Apache Kafka, Apache Airflow, Apache Hive, & Apache Cassandra.

Talk #3: From flat files to deconstructed database: The evolution & future of the big data ecosystem

In this talk, Julien discusses the key open source components of the big data ecosystemincluding Apache Calcite, Parquet, Arrow, Avro, & Kafka as well as batch & streaming systemsand explains how they relate to each other & how they make the ecosystem more of a database & less of a filesystem. (Parquet is the columnar data layout to optimize data at rest for querying. Arrow is the in-memory representation for maximum throughput execution & overhead-free data exchange. Calcite is the optimizer to make the most of our infrastructure capabilities.) Julien also explores the emerging components that are still missing or havent become standard yet to fully materialize the transformation to an extremely flexible database that lets you innovate with your data.

Speaker Bio:
Julien Le Dem is the coauthor of Apache Parquet & the PMC chair of the project. He is also a committer & PMC member on Apache Pig, Apache Arrow, & a few other projects. Julien is a principal engineer at WeWork where he works on the data platform architecture. Previously, he was an architect at Dremio; tech lead for Twitters data processing tools, where he also obtained a two-character Twitter handle (@J_); & a principal engineer & tech lead working on content platforms at Yahoo, where he received his Hadoop initiation. His French accent makes his talks particularly attractive.