Big Data w/ Hadoop & Spark | NYC Tech Events - GarysGuide

Big Data w/ Hadoop & Spark Sponsored Event

6-wk evening program providing hands-on intro to Hadoop & Spark ecosystem of big data technologies.

	NYC Data Science Academy, 500 8th Ave, Ste 905
	Jan 16 (Tue) , 2018 @ 07:00 PM
	$2990

DETAILS

This is a 6-week evening program providing a hands-on introduction to the Hadoop & Spark ecosystem of Big Data technologies. The course will cover these key components of Apache Hadoop: HDFS, MapReduce with streaming, Hive, & Spark. Programming will be done in Python. The course will begin with a review of Python concepts needed for our examples. The course format is interactive. Students will need to bring laptops to class. We will do our work on AWS (Amazon Web Services); instructions will be provided ahead of time on how to connect to AWS & obtain an account.

Hadoop is a set of open-source programs running in computer clusters that simplify the handling of large amounts of data. Originally, Hadoop consisted of a distributed file system tuned for large data sets & an implementation of the MapReduce parallelism paradigm, but has expanded in many ways. It now includes database systems, languages for parallelism, libraries for machine learning, its own job scheduler, & much more. Furthermore, MapReduce is no longer the only parallelism framework; Spark is an increasingly popular alternative. In summary, Hadoop is a very popular & rapidly growing set of cluster computing solutions, which is becoming an essential tool for data scientists.

Instructors:

Shu Yan obtained his Ph.D degree in Physics at the University of South Carolina. As a physicist with proficient analytical skills & strong programming background, he brings coding, data science & critical problem solving skills together to tackle real world problems. His physical intuition & mathematical reasoning always bring more insight when thinking about statistical models & machine learning.