Apache Hudi is a data platform technology that helps to build reliable & scalable data lakes. Hudi brings stream processing to big data, supercharging your data lakes making them orders of magnitude efficient. Hudi is widely used in Uber & other companies to build transactional data lakes.|
Please join us for a virtual meetup hosted by Uber & the Apache Hudi community. We will kick off with an update on Apache Hudi 0.7.0 release followed by interesting talks by the speakers from Uber, City Storage Systems & AWS.
12:00pm - 12:15pm - Welcome + Hudi at Uber- Prashant Wason/Satish Kotha (Uber)
12:15pm - 12:40pm - Hudi at City Storage Systems - Alexander Filipchik (CityStorageSystems)
12:40pm - 01:05pm - Hudi at AWS- Udit Mehrotra/Wenning Ding (AWS)
Hudi at Uber
In this talk, we will talk about the latest Apache Hudi 0.7.0 release, how we rollout changes in production & how some of these features are used at Uber.
Prashant Wason is a software engineer at Uber & Apache Hudi committer. Satish Kotha is a software engineer at Uber & Apache Hudi committer.
Hudi at City Storage Systems
Rebuilding the stack from a Monolith to microservices drastically affects how we do analytics. Data moves from a single database into multiple disconnected instances and, sooner or later, there is a need to ETL it into a separate DW, built to separate production workloads from analytical ones. That move is not without pain. Traditional Data Lake systems that run over DFS are poorly suited for fast-moving data, especially if you need the ability to update records, which we needed. In this talk, I will share our journey from a single DB to a distributed DW & show how we run HUDI on Kubernetes.
Alexander Filipchik spent the last 10 years tinkering with distributed systems at scale & even participated in the launch of a gaming console. Throughout the years he slowly moved from working on products that users touch to building data infrastructure that powers those products. When he is not thinking about the data world you can find me snowboarding, enjoying water sports, or hiking with my family.
Hudi at AWS
In this talk, we describe how Apache Hudi is integrated across various services offered at AWS to help customers build data ingestion pipelines for their Amazon S3 based data lakes, that can handle updated records from streaming inputs, Change Data Capture (CDC) from transactional systems & comply with data privacy regulations. We dive deeper into how Apache Hudi has been integrated with Amazon EMR, which is a cloud big data platform for processing vast amounts of data. We will also talk about some of the challenges we have faced at Amazon EMR while supporting our customers, the contributions we have been making to Apache Hudi & our roadmap ahead.
Wenning Ding is a software engineer at Amazon EMR, Apache Hudi contributor, working on the EMR release team that supports packaging, building & testing of new releases of Amazon EMR. He currently focuses on releasing Apache Hudi on EMR & building new features for the same.
Udit Mehrotra is a software engineer at Amazon EMR & an Apache Hudi committer, working on the EMR Applications team that supports & builds features for open source applications offered on Amazon EMR. He currently leads the engineering effort for supporting Apache Hudi on EMR. He has experience working in distributed computing & big data analytics domain for the last 5 years, contributing to several features of Amazon EMR like Autoscaling, Managed Notebooks, LakeFormation, Apache Spark Resiliency etc