|
|
| |
EVENT DETAILS |
The class is 35 hours of classroom guidance with an optional 3-week showcase project of students own choices and optional presentation of their projects. This class introduces a number of statistical models for supervised and unsupervised learning using R programming language. The goal is to understand the concepts, methods, and applications of the general predictive modeling and unsupervised learning and how they are implemented in the R language environment. A selection of important models (e.g. tree-based models, support vector machines) will be introduced in an intuitive manner to illustrate the process of training and evaluating models.
Time:10:00am 5:00pm
Instructor
Shu Yan Instructor. Shu Yan obtained his Ph.D degree in Physics at the University of South Carolina. As a physicist with proficient analytical skills and strong programming background, he brings coding, data science and critical problem solving skills together to tackle real world problems. His physical intuition and mathematical reasoning always bring more insight when thinking about statistical models and machine learning.
Syllabus
Each class is 35 hours of classroom guidance with an optional three week-long showcase project of students own choices and optional presentation of their projects.
Upon successful completion of the course, you will qualify for one of three certificates: Extraordinary Standing, Honorable Graduation, and Active Participation. Certificates are awarded according to your understanding, skill, and participation.
Week 1: Introducing Data mining 7 hours
What is data mining and how to do it
Steps to apply data mining to your data
Primary statistical methods and tests
Supervised versus unsupervised learning
Regression versus classification problems
Review of linear models
Simple linear regression
Logistic regression
Generalized linear models
Week 2: Performance Measures and Dimension Reduction 7 hours
Evaluating model performance
Confusion matrices
Beyond accuracy
Estimating future performance
Extension of linear models
Subset selection
Shrinkage methods
Dimension reduction methods
Week 3: kNN and Naive Bayes models 7 hours
The k-Nearest Neighbors model
Understanding the kNN algorithm
Calculating distance
Choosing an appropriate k
Case study
Naive Bayes models
Understanding joint probability
The Naive Bayes algorithm
The Laplace estimator
Case study
Week 4: Tree models and SVMs 7 hours
Tree models
Regression trees and classification trees
Tree models with party
Tree models with rpart
Random Forest models
GBM models
Support Vector Machines
Maximal margin classifiers
Support vector classifiers
Support vector machines
Week 5: The Association Rule and More Models 7 hours
Market Basket Analysis
Understanding association rules
The A priori algorithm
Case study
Unsupervised learning
K-means clustering
Hierarchical clustering
Time series models
Stationary time series
The ARIMA model
The seasonal model
Intended Audience and Prerequisite
Practitioners who wish to learn how to execute on predictive analytics by way of the R language; anyone who wants to turn ideas into software, quickly and faithfully. The students who have taken NYC Data Science Academys Data Science with R: Data Analytics course or for those who already have a firm understanding of R and are looking to extend those R skills to machine learning and advanced statistical methods.The goal of this course is to bring the students to near-expert level in this field. Be sure to read the course syllabus below to ensure your level is appropriate.
|
|
|
|
|
|