Despite the many amazing applications of statistics, machine learning, & visualization in industry, many attempts at doing "datascience" are anything but scientific. Specifically,datascienceprocesses often lack reproducibility, a key tenet ofsciencein general & a precursor to having true collaboration in a scientific (or engineering) community.
In this session, Daniel Whitenack, Data Scientist & Lead Developer Advocate atPachyderm, will discuss the importance of reproducibility & data provenance in any data science organization & will provide some practical steps to help data science organizations produce reproducible data analyses & maintain integrity in their data science applications. He will also demo a reproducible data science workflow that includes complete provenance explaining the entire process that produced specific results.
Event Schedule:
6:30 - 7:00: Enjoy food, drinks, & networking
7:00 - 7:45: Daniel presents + Q&A
8:00: Additional networking + event ends
Speaker Bio:
Daniel (@dwhitena)is a Ph.D.-trained data scientist working with Pachyderm (@pachydermIO), where he develops innovative, distributed data pipelines that include predictive models, data visualizations, statistical analyses, & more. He's spoken at conferences around the world (Datapalooza, DevFest Siberia, GopherCon, & more), maintains the Go kernel forJupyter, & is actively helping to organize contributions to various open source data science projects.
_____
Connect with us on Twitter before the event! @thisismetis