

 
DETAILS 
Introduction to Data Science Overview
Data science has become the central approach to tackling dataheavy problems in both business & academia. In this course, students learn how data science is done in the wild, with a focus on data acquisition, cleaning, & aggregation, exploratory data analysis & visualization, feature engineering, & model creation & validation. Students use the Python scientific stack to work through realworld examples that illustrate these concepts. Concurrently, students learn some of the statistical & mathematical foundations that power the datascientific approach to problem solving.
Who is this course for?
Introduction to Data Science is for anyone with a basic understanding of data analysis techniques & anyone interested in improving their ability to tackle problems involving multidimensional data in a systematic, principled way. A familiarity with a programming language is helpful, but unnecessary, if the prework for the course is completed (more on that below). No prior advanced mathematical training beyond an introductory statistics course is necessary.
Prerequisites
Students should have some experience with Python & have some familiarity with basic statistical & linear algebraic concepts such as mean, median, mode, standard deviation, correlation, & the difference between a vector & a matrix. In Python, it will be helpful to know basic data structures such as lists, tuples, & dictionaries, & what distinguishes them (that is, when they should be used).
Students should skip the prework if they can accomplish all of the following:
Write a program in Python that finds the most frequently occurring word in a given sentence.
Explain the difference between correlation & covariance, & why the difference between the two terms matters.
Multiply two small matrices together (e.g. 3X2 & 2X4 matrices).
Otherwise, students should complete the following prework (approximately 8 hours) before the first day of class:
Exercises 17, 13, 1821, 2735, 38,39 of Learn Python The Hard Way.
Videos 16 of Linear Algebra review from Andrew Ng's Machine Learning course (labeled as: III. Linear Algebra Review (Week 1, Optional).
The exercises in Chapters 2 & 3 of OpenIntro Statistics.
Outcomes
Upon completing the course, students have:
An understanding of problems solvable with data science & an ability to attack those problems from a statistical perspective.
An understanding of when to use supervised & unsupervised statistical learning methods on labeled & unlabeled datarich problems.
The ability to create data analytical pipelines & applications in Python.
Familiarity with the Python data science ecosystem & the various tools one can use to continue developing as a data scientist





