Please RSVP this class at http://nycdatascience.com/course/r-programming-intensive-intermediate/
Sign up for the newsletter for free Data Science learning material and upcoming classes at http://nycdatascience.com/
Sign up for NYC Open Data Meetup for free workshops twice/week!
Length of time: 35 hours
Date: April 27th, May 4th, 11th, 18th (four Sundays)
Time: 10:00am - 5:30pm
Extra teaching: 1 hour video / week * 5 weeks
Instructor: Vivian Zhang, CTO of Supstat Inc and Founder of NYC Data Science Academy
NYC Data Science Academy is now offering an R Intensive Intermediate course: a five week course designed around students who have taken NYC Data Science Academy's R Beginner course or for those who already have a firm skill set and understanding of R. The goal of this course is to bring our students to a near expert level.
Be sure to read the course syllabus below to ensure your level is appropriate.
Why R is important
R is a free, full, and dynamic programming language that, since its release in 1996, is on course to eclipse traditional statistical packages as the dominant interface in computational statistics, visualization, and data science. As an open-source platform, R has grown to become an incredibly flexible tool that can be applied to nearly every graphical and statistical problem, at virtually no cost to the user. The community of R users is continuing to build new functionality.
Project Demo Day and Certificates
From data mining to time series models, the course ends with a demonstration of a project of your choice on Project Demo Day.
On Demo Day you will showcase a project of your choosing, utilizing the tools and skill sets taught to you throughout this course. We encourage you to be creative! Students have chosen projects ranging from digital marketing simulation to finding the relation between people using natural language processing. The possibilities are nearly endless!
After the successful completion of the course, you will qualify for one of three certificates: Extraordinary Standing pass, Honorable Graduation pass, and Active Participation pass.
Certificates are awarded according to your understanding, skill, and participation.
1: Introducing Data mining (6 hours)
What is data mining and how to do it.
steps to apply data mining to your data
supervised versus unsupervised learning
regression versus classification problems
Review of linear models,
simple linear regression
generalized linear models
2: Performance Measure and Dimension Reduction (6 hours)
Evaluation model performance.
estimating future performance
Extension of linear models.
dimension reduction methods
3: KNN and NB model (6 hours)
K-nearest neighbors models.
understanding kNN algorithm
choosing an appropriate k
Naive Bayes models.
understanding joint probability
the naive bayes algorithm
the laplace estimator
4: Tree and SVM (6 hours)
regression trees- classification trees
tree model with party
tree model with rpart
random forest model
Support Vector Machines,
maximal margin classifier
support vector classifiers
support vector machines
5: Association Rule and More Models (6 hours)
Market basket analysis.
understanding association rules
the apriori algorithm
Time series models.
stationary time series
If we finish the class early, we will cover selected topics based on your need
Elementary statistical methods:
Abstract: The primary explanation to use R for statistical analysis , regression analysis, students can master the basic statistical significance and role model.
Case and Exercise: Using regression to predict commodity prices ; simulated casino game winner.
Frequency and contingency