Hey everyone,
We're excited to be partnering with Alexy Khrabrov, of Scale by the Bay, to be a part of several meetups brought together under the umbrella of Cognifest.org!
Cognifest.org is a city-wide event joining meetups that build the systems of tomorrow. For such systems, we need both data engineering & machine learning. A great example of the best Cognifest company is Spotify. They both process data at global scale & learn your preferences with data mining. Spotify will be both hosting & speaking at this evening with the talks described below.
Scale By the Bay, the home community of Cognifest, is the oldest conference where we show the best practices in data pipelines connecting data engineering & data science. Cognifest SF started By the Bay as a collaboration with the top companies implementing them, starting with IBM, Salesforce, NVIDIA, Cisco, Lightbend, & many others. Now Cognifest comes to New York!
We're adding new meetups to cognifest.org as we go along. There will be at least one meetup each evening of the week 10/23. Join us & build the future with us!
Talks:
Title: From prototyping to deploying, building ML systems & how Featran can help
Speaker: Samantha Hansen
The talk focuses on how to use Featran for both model training & within a service that evaluates the model. A challenge when requesting predictions from a model is ensuring that the input features used during training are identical to the on the fly features constructed within the service. This requires coordination between the offline training & the online servicing of predictions. To highlight this issue we detail a Spotify use case of building a classification model that predicts the probability of a playlist being streamed. We outline how & where Featran was used when going from conception to model deployment.
Title: Featran77
Speaker: Fallon Chen
Generic Feature Transformer for Data Pipelines Featran, a.k.a. Feature Transformer, Featran77 or F77, is a generic feature engineering library for Scala data pipeline frameworks, including Scio, Spark, Scalding & Flink. We'll talk about the design & implementation of the library, including uses of Algebird Semigroups, Aggregators, Breeze, & Scalacheck. We'll also cover other relevant topics that makes our data/ML pipelines scalable & type safe.