Please RSVP this class at http://nycdatascience.com/course/r-programming-intensive-beginner/
Sign up for the newsletter for free Data Science learning material & upcoming classes at http://nycdatascience.com/
Sign up for NYC Open Data Meetup for free workshops twice/week at http://www.meetup.com/NYC-Open-Data/
June 1st, 8th, 15th, 22nd, 29th(five Sundays, 35 hours), Beginner Level(no programming background required). It brings students familiar with machine learning by R. Our 70 hours full training program cover cleaning data, getting data from different resources, such as web scrapping, API fetching, reshaping data structures, publication ready visualization by ggplot2, performance measure of models, dimension reduction, k-nearest neighbors modeling, Naiye Bayes, Decision Tree, Support Vector Machine, Association rule
and more. check out our students' work: http://nycdatascience.com/blog/
Time: 10:00am - 5:00pm
Scott Kostyshak (Data Scientist at Supstat Inc, 5th year Econ PhD at Princeton Univ)
Vivian Zhang (CTO at SupStat Inc, double Masters Degree of Computer Science & Statistics)
NYC Data Science Academy is offering R Intensive Beginner: a five week course that will introduce you to the wonderful wold of R & provide you with an excellent understanding of the language that leaves you with a firm foundation to build upon.
Why R is important
R is a free, full, & dynamic programming language that, since its release in 1996, is on course to eclipse traditional statistical packages as the dominant interface in computational statistics, visualization, & data science. As an open-source platform, R has grown to become an incredibly flexible tool that can be applied to nearly every graphical & statistical problem, at virtually no cost to the user. The community of R users is continuing to build new functionality.
Project Demo Day & Certificates
From the rudimentary building blocks of programming basics, to data manipulation & use of advanced drawing packages, the course ends with a demonstration of a project of your choice on Project Demo Day. On Demo Day you will access & analyze real data, utilizing the tools & skillsets taught to you throughout the course. After the successful completion of the course, you will qualify for one of three certificates: Extraordinary Standing pass, Honorable Graduation pass, & Active Participation pass.
Certificates are awarded according to your understanding, skill, & participation.
1. Basics: 12 hours
● Abstract: Explain the basic operation of knowledge through this unit of study. Students
will learn the characteristics of R, resource acquisition mode, & mastery of basic programming.
● Case Study & Exercise: Use the R language to complete certain Euler Project problems
-How to learn R
-How to get help
-R language resources & books
-Custom Startup Items
2. Getting Data: 6 hours
● Abstract: Explain the various ways the R language reads data, bring the participants
through basic knowledge of web crawling, & connect to the database via sql statement
calling data from a variety of locally read excel file data.
● Case & Exercise: Crawl watercress data on the site & write a custom function.
○ Web data capture
○ API data source
○ Connect to the database
○ Local Documentation
○ Other data sources
○ Data Export
3. Data Manipulation: 6 hours
● Abstract: How to manipulate data & use R for the all kinds of data conversion,
especially for string operation processing .
● Case Study & Exercise: Find the QQ(the most used instant messenger tool) group,
then discuss research options with text features.
○ Data sorting
○ Merge Data
○ Summary data
○ Remodeling Data
○ Take a subset of data
○ String manipulation
○ Date Actions
4. Data Visualization: 6 hours
● Abstract: Cover two advanced drawing packages (Lattice & ggplot2) & understand
the various methods of visualization.
● Case & Exercise: Using graphics, text & other data
○ Box Plot
○ Matrix related
Note: If class finishes early, we will cover selected topics below based on your need
1. Elementary Statistical Methods:
● Abstract: The primary explanation to use R for statistical analysis & regression
analysis. Students will master the basic statistical significance & role model.
● Case & Exercise: Using regression to predict commodity prices―simulated casino
○ Descriptive Statistics
○ Statistical Distributions
○ Frequency & contingency tables
○ Linear Regression
○ T Test
○ Non-parametric statistics
2. Preliminary Data Mining:
● Abstract: Explain the R language for data mining expansion pack & functions use.
Students will master two mining methods, supervised learning & unsupervised
● Case & Exercise: Use R to participate in Kaggle Data Mining Competition
○ General Mining Process
○ Rattle bag
○ Hierarchical clustering
○ K -means clustering
○ Decision Trees
○ BP neural network