NYC Tech Events - GarysGuide | The #1 Resource for NYC Tech

LOCATION

EVENT DETAILS

Co-Hosted by NYC Open Data meetup and NYC Data Science Academy meetup.

Presented by NYC Data Science Academy students who just finished the 12-week full time program. Apply April 2017bootcamp Now.

++++++++++++++++++++++++

During this event you will see some of the best machine learning and big data projects created by NYC Data Science Academy 12-week Data Science bootcamp students.

You will also have an opportunity to meet our bootcamp students and find out more about what it is like to be a student at NYC Data Science Academy and gain an overview of the program.Join us for data wrangling tips, fun facts and in-depth discussions.

Event schedule:

6:30 pm - 7:00 pm Check in, mingle, enjoy food & drinks

7:00 pm - 8:30 pm Student presentation

8:30 pm - 9:00 pm Network and meet our students

====================

#ThrowBack Projects

Project #1: Enigma, by SpencerStebbins

Enigma is a step-by-step graphical user interface that directs the data science process and advises users of models and ensembles fittingthestructure of the users uploaded dataset.Enigmas is a Python Flask application using the caret package from R and has a front-end built usingReact.js, Material-UI,Plotly.js fordiagnostic plots, and sports aThree.jsnetwork graph that visually illustrates therecommendationprocess.

----------------------------------

Project #2:Predicting Consumer Credit Default - the Kaggle Challenge

#1 ranked solution in Kaggle

byBernard Ong, Jielei Emma Zhu,Miaozhi Trinity Yu,Nanda Trichy Rajarathinam

The team participated in a Kaggle closed competition to predict consumer credit default. Banks continue to look for the best credit scoring algorithm and one of its tenets is to predict the probability of an individual defaulting on their loanor undergoing financial distress. With this information, banks can make better decisions and borrowers can also do better financial planning. This challenge allowed the team to employ the best of machine learning models and algorithms to accurately predict the probability of default. Besides the technical aspects of the project, the team also used an Agile process designed for machine learning to ensure tasks can bedone in parallel, in small chunks, and to fail fast, and iterate fast. The team ultimately achieved the highest AUC score possible, through six highly tuned models. They not only reach the top tiers of thescoreboard,but also smashed through to garnerthe #1 top ranking in the challenge. This is the story of how they did it.

Capstone Project Blog Link:

http://blog.nycdatascience.com/student-works/kaggle-predict-consumer-credit-default/

------------

Project #3: Where To? - An Uber Driver Optimization System

Given byChristianHolmesandShuheng(Shawn) Li

Driving an Uber is expensive: gas, car maintenance, and free candy for your passengers all add up, not to mention the opportunity cost you lose while you're working. Thus, when drivers are actually working, it's important that they don't waste time searching for their next fare. In our project, we seek to help uber drivers by determining where the most people are looking for a ride.

We started with a massive dataset of taxi and Uber rides: 200 gigabytes containing over 1.1 billion rides all across New York City.We used Python Spark and Hive to handle and organize our data, and then merged data on weather, restaurant information, and traffic conditions. We used this data to build an algorithm that predicts fares in New York by neighborhood at any time of the week.

Since our algorithm predicts demand, Uber and other rideshare services can use it to set surge prices and send drivers to optimal locations. We wanted the results of our algorithm to be accessible to all cab drivers, though, so we built an app using a Flask framework to make this a reality. The app automatically brings in data from Google Maps, local gas stations, and current traffic conditions to help Uber drivers go to where they need to be. We hope you enjoy it!

-----------------

Project #4:Yelp Nearby - A Multiuser Restaurant Recommendation Engine

Team Members: Aiko Liu,AmyTzu-Yu Chen, David Steinmetz, Greg Domingo

You want to go out to eat with friends but picking out a restaurant that satisfy everyones taste is frustrating and difficult. You wonder whether there is a better way to find a common ground

Our latest product, Yelp Nearby, will be exactly what you are looking for. Yelp Nearby uses Yelp data in conjunction with collaborative filtering to find restaurants that suit the tastes of multiple people. The output of our GraphLab-based recommendation engine is fed into Flask front end, which allows users to search in any US city and visualize recommendation results.

----------

Project #5: Yelper: A recommendation system based on collaborative filtering byChuan Sun

Getting information off the internet is like taking a drink from a fire hydrant (Mitchell Kapor). Information overload is a real phenomenon that prevents us from making good decisions or taking actions. This is why recommendation systems are becoming common and extremely useful in products such as Netflix, Amazon Echo, and Facebook NewsFeed.
In this capstone project,Chuanbuilt Yelper, a business recommendation system built in Python and Scala on top of Spark ecosystem. Here are some features of Yelper:

1. Matrix Factorization based recommendation using Spark MLlib

2. Functional webserver to recommend highly rated businesses for users

3. Simulation of real-time user requests handled by Spark Streaming and Apache Kafka

4. User-business graph visualization using D3 and graph-tool library

5. Graphical analysis on user-business interactions using Spark GraphX in Scala

---------------

Project #6: Predicting Horse Race Results in India

ByBen Townson andSharan Duggal

For their final project as part of cohort #6 at the New York City Data Science Academy,BenTownsonand Sharan Duggal set out to build a predictive model for horse racing in India. They used nearly all of the techniques taught during the bootcamp, beginning with web-scraping to obtain the data, visual analysis, and ending with a stacked predictive model utilizing several machine learning techniques. During the process, they encountered numerous challenges beginning with obtaining data to missingness to finding the best formats to predict and interpret results of their models, and are excited to present their process and findings to the Meetup audience.