Nowadays websites can easily track & store user events such as queries, result clicks & purchases, then how to use this collective behavior to guide us for better search. In this talk, we will walk through several applications of those signal data analysis, such as spell checker, synonym detection, finding phrases & tail query rewriting. We will also demo how to generate those analytical results & use them to perform run time query rewriting by a system, which combines the power of a search engine (in our case Apache Solr) with the power of a fast distributed compute engine like Apache Spark, to bring data science into production.
Chao HanLucidworksVP of ResearchChao is a data scientist with over 10 years of analytical experience in both academia & industry. She got a PHD in Statistics from Virginia Tech in 2012 (with 8 publications). After graduation, she worked at JPMorgan Chase R&D supporting projects in the areas of transaction text mining, social media sentiment analysis, fraud detection, default prediction & target marketing. She also initiated & lead the "Robot Modeler" project to reduce predictive modeling time from months to days. She joined SAS in 2015 to help develop a new platform, which is an in-memory multi-threaded analytic engine that enables fast model implementation calculations on a gridded network. Currently, Chao is the head of R&D at Lucidworks, to help build a new product called Fusion AI with functionalities such as recommendation, query analytics, automatic document clustering & QA system.