|
|
| |
EVENT DETAILS |
Dates: August 11, 13, 18, 21, 25, 27, September 1, 3, 8, 10, 15, 17 (Twelve Classes, Tuesday and Thursday Nights)
Time: 7:00-9:30pm
Length of class: 30 hours
Instructor:
Sam Kamin is Associate Professor Emeritus from the University of Illinois Champaign Urbana where he taught computer science for over 20 years and was head of the undergraduate program. Most recently he was an engineer at Google before joining NYC Data Science Academy.
Venue: 205 E 42nd Street, New York, NY 10017 (5 min from Grand Central)
Course Overview:
An intensive, hands-on introduction to the Hadoop ecosystem of Big Data technologies. The emphasis in this course is on learning several of the major components of Apache Hadoop - HDFS, MapReduce, Hive, Pig, Streaming - by doing exercises of increasing complexity. Programming will be done in Python. Students are expected to be familiar with using an operating system from the command line; knowledge of Python is helpful; the material in Learn Python the Hard Way is sufficient background knowledge. The course format is mixed lecture/lab. Students will need to bring their own laptops to connect to our server; instructions will be provided ahead of time as to how to install any required software.
What is Hadoop?
Hadoop is an open-source database framework that allows for the processing of large data sets using parallel computing methods. Utilizing Google's MapReduce and the Hadoop Distributed File System (HDFS), Hadoop allows for scalability, flexibility and fault tolerance. Hadoop is optimized to handle massive quantities of data either structured, semi-structured, or unstructured- meaning Hadoop is perfect for Big Data. As part of the Apache Framework, there is a host of Apache compliments such as Hive, Pig and Zookeeper, that further extend Hadoop's applications and usability.
|
|
|
|
|
|