This month, we will be joining Tapad for an evening of talks on the intersection of Scala and Data Engineering.
Talk 1 - KNN with Apache Flink
Dan Blazevski, Insight Data Engineering
About the Talk:
We will present some recent progress on Apache Flink's machine learning library, focusing on a new implementation of the k-nearest neighbors (knn) algorithm for Flink. In the spirit of the Kappa Architecture, Apache Flink is a distributed batch and stream processing tool that treats batch as a special case of stream processing. We will discuss a few ways, both exact and approximate, to do distributed knn queries, focusing on using quadtrees to spatially partition the training set and using z-value based hashing to reduce dimensionality.
Bio:
Dan Blazevski loves distributed computing. He has prior academic/lab work experience at ETH Zurich and Oak Ridge National Laboratory in computational physics and engineering after completing his PhD in Mathematics from UT Austin. Although he still occasionally misses the good 'ol days of Fortran and MPI, he's pretty excited to have made the transition to industry as a Data Engineering Insight Fellow in 2015 where he started working on Flink, and now helps lead the Fellows program in NYC.
Talk 2 - A presentation on an efficient way to find connected components in a graph using map-reduce. (More info TBA)
Schedule:
6:30pm - Doors open
7:00pm - Talk 1 and Q&A
7:30pm - Talk 2 and Q&A
8:00pm - Socializing with speakers and attendees
8:30pm - Close