We are excited to have two talks, the first talk speaker travel from Europe to US & we are lucky to have him to give a talk at our meetup
Agenda
6 pm -- 6:30 pm light dinner + networking
6:35pm -- 7:20 pm Talk 1 + QA
7:25pm -- 8:10 pm Talk 2 + QA
8:10 pm -- 8:30 pm Networking
8:30 pm -- 8:45 pm closing
Talk 1 : Introduction to Vespa the open source big data serving engine (Yahoo)
Offline & stream processing of big data sets can be done with tools such as Hadoop, Spark, & Storm, but what if you need to process big data at the time a user is making a request? This talk introduces Vespa the open source big data serving engine. Vespa allows you to search, organize & evaluate machine-learned models from e.g TensorFlow over large, evolving data sets with latencies in the tens of milliseconds. Vespa is behind the recommendation, ad targeting & search at Yahoo where it handles billions of daily queries over billions of documents & was recently open sourced at http://vespa.ai.
Speaker: Jon Bratseth
Jon Bratseth is a distinguished architect in Oath (former Yahoo), & the architect & one of the main contributors to Vespa, the open big data serving engine. Jon has 20 years experience as an architect & programmer on large distributed systems. He has a master in computer science from the Norwegian University of Science & Technology.
talk 2 : Lambda Architecture in Practice (Amplitude)
Over the last few years, Lambda Architecture has emerged as a common paradigm for building distributed data processing systems. We'll be looking at two case studies of custom analytics use cases in order to understand Lambda Architecture in practice & how it impacts the complexity of managing the long-term data.
The first is Sumo Logic, where we built a distributed full-text search & aggregation system on top of Lucene. This was before Lambda Architecture was popularized, & the infrastructure used batch processing only. However, because of the real-time requirements of log management, the batch layer had to commit new data quickly (approximately every minute), which led to fragmentation. In order to combat that, an additional reindexing mechanism was introduced that added significant complexity to the long-term data layer.
The second is Amplitude, where we have built a distributed column store using Lambda Architecture. Our system, Nova, is inspired by Druid, with streaming & batch processing layers that can handle the same query patterns. Some of the key practical problems we solved include avoiding the duplication of query logic across the two layers & ensuring proper handoffs of data, both potential pitfalls of using Lambda Architecture. While the resulting system is more complex operationally, it has made managing the long-term data much simpler & less error-prone.
Speaker Jeffrey Wang
Jeffrey Wang is a Co-founder & Chief Architect at Amplitude Analytics. He works on a variety of product & engineering problems, including building out the columnar data store that powers Amplitude. He studied CS at Stanford & previously worked at Palantir & Sumo Logic.