Events  Deals  Jobs  SF Climate Week 2024 
    Sign in  
 
 
6 weeks of Mon/Wed.
Mon, Oct 05, 2015 @ 07:00 PM   $2990   NYC Data Science Academy, 205 E 42nd St, 19th Fl
 
   
 
 
              

    
 
Sign up for our awesome New York
Tech Events weekly email newsletter.
   
LOCATION
EVENT DETAILS

DETAILS

8 seats limited.

Dates:
Mondays & Wednesdays |October 5, 7, 14, 19, 21, 26, 28, November 2, 4, 9, 11, 16
(Twelve Classes, Monday and Wednesday Nights)

Time:
7:00-9:30pm

Length of class: 30 hours

Instructor:
Sam Kamin is Associate Professor Emeritus from the University of Illinois Champaign Urbana
where he taught computer science. Most recently he was an engineer at Google before joining NYC Data Science
Academy as VP of Engineering.

Venue:
205 E 42nd Street, New York, NY 10017( 5 min from Grand Central)


Course Overview

An intensive, hands-on introduction to the Hadoop ecosystem of Big Data technologies.

The emphasis in this course is on learning several of the major components of ApacheHadoop HDFS, MapReduce, Hive, Pig, Streaming by doing exercises of increasingcomplexity. Programming will be done in Python.

Students are expected to be familiar with using an operating system from the command line;knowledge of Python is helpful; the material in <<Learn Python the Hard Way>> is sufficientbackground knowledge.

The course format is mixed lecture/lab. Students will need tobring their own laptops to connect to our server; instructions will be provided ahead of timeas to how to install any required software.

What is Hadoop?

Hadoop is an open-source database framework that allows for the processing oflarge data sets using parallel computing methods. Utilizing Googles MapReduceand the Hadoop Distributed File System (HDFS), Hadoop allows for scalability,flexibility and fault tolerance. Hadoop is optimized to handle massive quantitiesof data either structured, semi-structured, or unstructured meaning.

Hadoop is perfect for Big Data. As part of the Apache Framework, there isa host of Apache compliments such as Hive, Pig and Zookeeper, that furtherextend Hadoops applications and usability.

SYLLABUS

Week 1 Introduction: MapReduce

Overview of Big Data and the Hadoop ecosystem
The concept of MapReduce
HDFS Hadoop Distributed File System
MapReduce with Python streaming

Week 2 More on MapReduce

More on Big Data, the Hadoop ecosystem, and MapReduce.
Mixed case studies and exercises using MR with Python streaming


Week 3 Hive: A database for Big Data

Hive concepts
HiveQL
User-defined functions in the Hive language
User-defined functions in Python (using streaming)
Advanced topic: Hive queries in Python code

Week 4 Pig: Simplified MapReduce


Basic concepts
Pig Latin
Pig functions and macros
User-defined functions


Week 5 Spark

Intro to Spark
Intro to Mahout


Week 6 Project day

The Hadoop ecosystem
Brief intro to Spark
Brief intro to Mahout
Case studies/Final projects

 
 
 
 
© 2024 GarysGuide      About    Feedback    Press    Terms