Events  Classes  Deals  Spaces  Jobs 
    Sign in  
Taught by Vivian Zhang (CTO of SupStat Inc and Founder of NYC Data Science Academy)
Sunday, April 27, 2014 at 10:00 AM    Cost: $2100
AlleyNYC, 500 7th Ave, 17th Fl

Sign up for our awesome New York
Tech Events weekly email newsletter.
Please RSVP this class at

Sign up for the newsletter for free Data Science learning material & upcoming classes at

Sign up for NYC Open Data Meetup for free workshops twice/week!

Length of time: 35 hours

Date: April 27th, May 4th, 11th, 18th (four Sundays)

Time: 10:00am - 5:30pm

Extra teaching: 1 hour video / week * 5 weeks

Instructor: Vivian Zhang, CTO of Supstat Inc & Founder of NYC Data Science Academy

Course Overview

NYC Data Science Academy is now offering an R Intensive Intermediate course: a five week course designed around students who have taken NYC Data Science Academy's R Beginner course or for those who already have a firm skill set & understanding of R. The goal of this course is to bring our students to a near expert level.
Be sure to read the course syllabus below to ensure your level is appropriate.

Why R is important

R is a free, full, & dynamic programming language that, since its release in 1996, is on course to eclipse traditional statistical packages as the dominant interface in computational statistics, visualization, & data science. As an open-source platform, R has grown to become an incredibly flexible tool that can be applied to nearly every graphical & statistical problem, at virtually no cost to the user. The community of R users is continuing to build new functionality.

Project Demo Day & Certificates

From data mining to time series models, the course ends with a demonstration of a project of your choice on Project Demo Day.
On Demo Day you will showcase a project of your choosing, utilizing the tools & skill sets taught to you throughout this course. We encourage you to be creative! Students have chosen projects ranging from digital marketing simulation to finding the relation between people using natural language processing. The possibilities are nearly endless!
After the successful completion of the course, you will qualify for one of three certificates: Extraordinary Standing pass, Honorable Graduation pass, & Active Participation pass.
Certificates are awarded according to your understanding, skill, & participation.


1: Introducing Data mining (6 hours)

● What is data mining & how to do it.
○ steps to apply data mining to your data
○ supervised versus unsupervised learning
○ regression versus classification problems
● Review of linear models,
○ simple linear regression
○ logistic regression
○ generalized linear models

2: Performance Measure & Dimension Reduction (6 hours)

● Evaluation model performance.
○ confusion matrices
○ beyond accuracy
○ estimating future performance
● Extension of linear models.
○ subset selection
○ shrinkage methods
○ dimension reduction methods

3: KNN & NB model (6 hours)

● K-nearest neighbors models.
○ understanding kNN algorithm
○ calculating distance
○ choosing an appropriate k
○ case study
● Naive Bayes models.
○ understanding joint probability
○ the naive bayes algorithm
○ the laplace estimator
○ case study

4: Tree & SVM (6 hours)

● Tree models.
○ regression trees- classification trees
○ tree model with party
○ tree model with rpart
○ random forest model
○ GBM model
● Support Vector Machines,
○ maximal margin classifier
○ support vector classifiers
○ support vector machines

5: Association Rule & More Models (6 hours)

● Market basket analysis.
○ understanding association rules
○ the apriori algorithm
○ case study
● Unsupervised learning.
○ K-means clustering
○ Hierarchical clustering
○ case study
● Time series models.
○ fundamental concepts
○ stationary time series
○ ARIMA model
○ seasonal model

If we finish the class early, we will cover selected topics based on your need
Elementary statistical methods:
Abstract: The primary explanation to use R for statistical analysis , regression analysis, students can master the basic statistical significance & role model.
Case & Exercise: Using regression to predict commodity prices ; simulated casino game winner.

Descriptive Statistics
Statistical Distributions
Frequency & contingency
Logistic Regression
Non-parametric statistics
Linear Regression
Regression Diagnostics
Robust Regression
Nonlinear regression
Principal Component
Statistical Simulation
© 2017 GarysGuide      About   Terms   Press   Feedback