Events  Classes  Deals  Spaces  Jobs 
    Sign in  
 
 
With Scott Kostyshak (Data Scientist @ Supstat) and Vivian Zhang (CTO @ SupStat)
Sunday, July 27, 2014 at 10:00 AM    Cost: $1490
AlleyNYC, 500 7th Ave, 17th Fl
 
     
 
 
              

              
 
Sign up for our awesome New York
Tech Events weekly email newsletter.
   
 
LOCATION
 
DESCRIPTION
You can contact vivian.zhang@supstat.com to get corporate training or small group training.

Date: July 6th, 13rd, 20th, 27th, & August 3rd

Time: 10:00am - 5:00pm

Instructors:
Scott Kostyshak (Data Scientist at Supstat Inc, 5th year Econ PhD at Princeton Univ)
Vivian Zhang (CTO at SupStat Inc, double Masters Degree of Computer Science & Statistics)

Venue: 500 7th Ave, 17th Fl., New York, NY (close to Times Square)


Course Overview
NYC Data Science Academy is offering R Intensive Beginner: a five week course that will introduce you to the wonderful wold of R & provide you with an excellent understanding of the language that leaves you with a firm foundation to build upon.

Why R is important
R is a free, full, & dynamic programming language that, since its release in 1996, is on course to eclipse traditional statistical packages as the dominant interface in computational statistics, visualization, & data science. As an open-source platform, R has grown to become an incredibly flexible tool that can be applied to nearly every graphical & statistical problem, at virtually no cost to the user. The community of R users is continuing to build new functionality.

Project Demo Day & Certificates
From the rudimentary building blocks of programming basics, to data manipulation & use of advanced drawing packages, the course ends with a demonstration of a project of your choice on Project Demo Day. On Demo Day you will access & analyze real data, utilizing the tools & skillsets taught to you throughout the course. After the successful completion of the course, you will qualify for one of three certificates: Extraordinary Standing, Honorable Graduation, & Active Participation.

Certificates are awarded according to your understanding, skill, & participation.

Syllabus

1. Basics: 12 hours
Abstract: Explain the basic operation of knowledge through this unit of study. Students will learn the characteristics of R, resource acquisition mode, & mastery of basic programming.
Case Study & Exercises: Use the R language to complete certain Euler Project problems.

How to learn R
How to get help
R language resources & books
RStudio
Expansion Pack
Workspace
Custom Startup Items
Batch Mode
Data Objects
Custom Functions
Control Statements
Vectorized Operations
2. Getting Data: 6 hours
Abstract: Explain the various ways the R language reads data, bring the participants through basic knowledge of web crawling, & connect to the database via sql statement calling data from a variety of locally read excel file data.
Case & Exercises: Crawl watercress data on the site & write a custom function.

Web data capture
API data source
Connect to the database
Local Documentation
Other data sources
Data Export
3. Data Manipulation: 6 hours
Abstract: How to manipulate data & use R for the all kinds of data conversion, especially for string operation processing.
Case Study & Exercise: Find the QQ (the most used instant messenger tool) group, then discuss research options with text features.

Data sorting
Merge Data
Summary data
Remodeling Data
Take a subset of data
String manipulation
Date Actions
4. Data Visualization: 6 hours
Abstract: Cover two advanced drawing packages (Lattice & ggplot2) & understand the various methods of visualization.
Case & Exercises: Using graphics, text & other data.

Histogram
Point
Column
Line
Pie
Box Plot
Scatter
Matrix related
Map
Note: If class finishes early, we will cover selected topics below based on your need.

1. Elementary Statistical Methods:
Abstract: The primary explanation to use R for statistical analysis & regression
analysis. Students will master the basic statistical significance & role model.
Case & Exercise: Using regression to predict commodity prices―simulated casino game winner.

Descriptive Statistics
Statistical Distributions
Frequency & contingency tables
Linear Regression
Correlation
T Test
Non-parametric statistics
2. Preliminary Data Mining:
Abstract: Explain the R language for data mining expansion pack & functions use. Students will master two mining methods, supervised learning & unsupervised learning.
Case & Exercise: Use R to participate in Kaggle Data Mining Competition

General Mining Process
Rattle bag
Hierarchical clustering
K -means clustering
Decision Trees
BP neural network
 
 
 
 
© 2017 GarysGuide      About   Terms   Press   Feedback