SF Tech Events - GarysGuide | The #1 Resource for SF Tech

LOCATION

EVENT DETAILS

Let's kick off the New Year 2019 with our first BASM Meetup!

Join us for an evening of Bay Area Apache Spark Meetup featuring tech-talks about Apache Spark at scale from Unravel Data & Databricks.

Agenda:

6:30 - 7:00 pm: Social Hour with Food, Drinks, Beer & Wine
7:00 - 7:05 pm: Jules Introduction & Announcements
7:05 - 7:50 pm: Tech Talk from Unravel Data
8:00 - 8:45 pm: Tech Talk from Databricks
8:45 - 9:00 pm: Additional Networking, Q&A

Tech Talk 1: Putting AI to Work on Apache Spark

Presenter: Shivnath Babu

Abstract: Apache Spark simplifies AI, but why not use AI to simplify Spark performance & operations management? An AI-driven approach can drastically reduce the time Spark application developers & operations teams spend troubleshooting problems.

This talk will discuss algorithms that run real-time streaming pipelines as well as build ML models in batch to enable Spark users to automatically solve problems like: (i) fixing a failed Spark application, (ii) auto tuning SLA-bound Spark streaming pipelines, (iii) identifying the best broadcast joins & caching for SparkSQL queries & tables, (iv) picking cost-effective machine types & container sizes to run Spark workloads on the AWS, Azure, & Google cloud; & more.

Bio: CTO & Co-Founder at Unravel Data Systems & an adjunct professor of computer science at Duke University. Shivnath co-founded Unravel to solve the application management challenges that companies face when they adopt systems like Hadoop & Spark. Unravel originated from the Starfish platform built at Duke, which has been downloaded by over 100 companies. Shivnath has won a US National Science Foundation CAREER Award, three IBM Faculty Awards, & an HP Labs Innovation Research Award.

Tech Talk 2: Project Hydrogen, HorovodRunner, & Pandas UDF: Distributed Deep Learning Training & Inference on Apache Spark

Presenter: Lu Wang
Abstract:

Big data & AI are joined at the hip: the best AI applications require massive amounts of constantly updated training data to build state-of-the-art models. AI has always been one of the most exciting applications of big data. Project Hydrogen is a major Apache Spark initiative to bring the best AI & big data solutions together. It introduced barrier execution mode to Spark 2.4.0 release to help distributed model training, & it explores optimized data exchange to accelerate distributed model inference.

In this talk, we will explain why barrier execution mode is needed, how it works, & how to use it to integrate distributed DL training on Spark. We will demonstrate HorovodRunner, the first Spark+AI integration powered by Project Hydrogen. It is based on the Horovod framework developed by Uber & Databricks Runtime 5.0 for Machine Learning.

We will also share our experience & performance tips on how to combine Pandas UDF from Spark & AI frameworks to scale complex model inference workload.

Bio: Lu Wang is a software engineer at Databricks. His main research interests are developing high-performance parallel algorithms for scientific computing & applications. He was actively involved in the development of the Project Hydrogen, Spark Deep Learning pipelines, & Spark MLlib since he joined Databricks. Before Databricks, he was working on parallel multigrid linear solvers on exascale parallel machines for solving the linear systems from reservoir simulations at Lawrence Livermore national laboratory. He received his Ph.D. from the Pennsylvania State University in 2014.