NYC Tech Events - GarysGuide | The #1 Resource for NYC Tech

LOCATION

EVENT DETAILS

DETAILS

8 seats limited.

Dates:
Mondays & Wednesdays |October 5, 7, 14, 19, 21, 26, 28, November 2, 4, 9, 11, 16
(Twelve Classes, Monday and Wednesday Nights)

Time:
7:00-9:30pm

Length of class: 30 hours

Instructor:
Sam Kamin is Associate Professor Emeritus from the University of Illinois Champaign Urbana
where he taught computer science. Most recently he was an engineer at Google before joining NYC Data Science
Academy as VP of Engineering.

Venue:
205 E 42nd Street, New York, NY 10017( 5 min from Grand Central)

Course Overview

An intensive, hands-on introduction to the Hadoop ecosystem of Big Data technologies.

The emphasis in this course is on learning several of the major components of ApacheHadoop HDFS, MapReduce, Hive, Pig, Streaming by doing exercises of increasingcomplexity. Programming will be done in Python.

Students are expected to be familiar with using an operating system from the command line;knowledge of Python is helpful; the material in <<Learn Python the Hard Way>> is sufficientbackground knowledge.

The course format is mixed lecture/lab. Students will need tobring their own laptops to connect to our server; instructions will be provided ahead of timeas to how to install any required software.

What is Hadoop?

Hadoop is an open-source database framework that allows for the processing oflarge data sets using parallel computing methods. Utilizing Googles MapReduceand the Hadoop Distributed File System (HDFS), Hadoop allows for scalability,flexibility and fault tolerance. Hadoop is optimized to handle massive quantitiesof data either structured, semi-structured, or unstructured meaning.

Hadoop is perfect for Big Data. As part of the Apache Framework, there isa host of Apache compliments such as Hive, Pig and Zookeeper, that furtherextend Hadoops applications and usability.

SYLLABUS

Week 1 Introduction: MapReduce

Overview of Big Data and the Hadoop ecosystem
The concept of MapReduce
HDFS Hadoop Distributed File System
MapReduce with Python streaming

Week 2 More on MapReduce

More on Big Data, the Hadoop ecosystem, and MapReduce.
Mixed case studies and exercises using MR with Python streaming

Week 3 Hive: A database for Big Data

Hive concepts
HiveQL
User-defined functions in the Hive language
User-defined functions in Python (using streaming)
Advanced topic: Hive queries in Python code

Week 4 Pig: Simplified MapReduce

Basic concepts
Pig Latin
Pig functions and macros
User-defined functions

Week 5 Spark

Intro to Spark
Intro to Mahout

Week 6 Project day

The Hadoop ecosystem
Brief intro to Spark
Brief intro to Mahout
Case studies/Final projects