Events  Deals  Jobs  SXSW 2024 
    Sign in  
 
 
With Scott Kostyshak (Data Scientist @ Supstat) & Vivian Zhang (CTO @ SupStat)
Sun, Jul 27, 2014 @ 10:00 AM   $1490   AlleyNYC, 500 7th Ave, 17th Fl
 
     
 
 
              

  
 
Sign up for our awesome New York
Tech Events weekly email newsletter.
   
LOCATION
EVENT DETAILS
You can contact vivian.zhang@supstat.com to get corporate training or small group training.

Date: July 6th, 13rd, 20th, 27th, and August 3rd

Time: 10:00am - 5:00pm

Instructors:
Scott Kostyshak (Data Scientist at Supstat Inc, 5th year Econ PhD at Princeton Univ)
Vivian Zhang (CTO at SupStat Inc, double Masters Degree of Computer Science and Statistics)

Venue: 500 7th Ave, 17th Fl., New York, NY (close to Times Square)


Course Overview
NYC Data Science Academy is offering R Intensive Beginner: a five week course that will introduce you to the wonderful wold of R and provide you with an excellent understanding of the language that leaves you with a firm foundation to build upon.

Why R is important
R is a free, full, and dynamic programming language that, since its release in 1996, is on course to eclipse traditional statistical packages as the dominant interface in computational statistics, visualization, and data science. As an open-source platform, R has grown to become an incredibly flexible tool that can be applied to nearly every graphical and statistical problem, at virtually no cost to the user. The community of R users is continuing to build new functionality.

Project Demo Day and Certificates
From the rudimentary building blocks of programming basics, to data manipulation and use of advanced drawing packages, the course ends with a demonstration of a project of your choice on Project Demo Day. On Demo Day you will access and analyze real data, utilizing the tools and skillsets taught to you throughout the course. After the successful completion of the course, you will qualify for one of three certificates: Extraordinary Standing, Honorable Graduation, and Active Participation.

Certificates are awarded according to your understanding, skill, and participation.

Syllabus

1. Basics: 12 hours
Abstract: Explain the basic operation of knowledge through this unit of study. Students will learn the characteristics of R, resource acquisition mode, and mastery of basic programming.
Case Study and Exercises: Use the R language to complete certain Euler Project problems.

How to learn R
How to get help
R language resources and books
RStudio
Expansion Pack
Workspace
Custom Startup Items
Batch Mode
Data Objects
Custom Functions
Control Statements
Vectorized Operations
2. Getting Data: 6 hours
Abstract: Explain the various ways the R language reads data, bring the participants through basic knowledge of web crawling, and connect to the database via sql statement calling data from a variety of locally read excel file data.
Case and Exercises: Crawl watercress data on the site and write a custom function.

Web data capture
API data source
Connect to the database
Local Documentation
Other data sources
Data Export
3. Data Manipulation: 6 hours
Abstract: How to manipulate data and use R for the all kinds of data conversion, especially for string operation processing.
Case Study and Exercise: Find the QQ (the most used instant messenger tool) group, then discuss research options with text features.

Data sorting
Merge Data
Summary data
Remodeling Data
Take a subset of data
String manipulation
Date Actions
4. Data Visualization: 6 hours
Abstract: Cover two advanced drawing packages (Lattice and ggplot2) and understand the various methods of visualization.
Case and Exercises: Using graphics, text and other data.

Histogram
Point
Column
Line
Pie
Box Plot
Scatter
Matrix related
Map
Note: If class finishes early, we will cover selected topics below based on your need.

1. Elementary Statistical Methods:
Abstract: The primary explanation to use R for statistical analysis and regression
analysis. Students will master the basic statistical significance and role model.
Case and Exercise: Using regression to predict commodity pricessimulated casino game winner.

Descriptive Statistics
Statistical Distributions
Frequency and contingency tables
Linear Regression
Correlation
T Test
Non-parametric statistics
2. Preliminary Data Mining:
Abstract: Explain the R language for data mining expansion pack and functions use. Students will master two mining methods, supervised learning and unsupervised learning.
Case and Exercise: Use R to participate in Kaggle Data Mining Competition

General Mining Process
Rattle bag
Hierarchical clustering
K -means clustering
Decision Trees
BP neural network
 
 
 
 
© 2024 GarysGuide      About    Feedback    Press    Terms