Intended Audience: Architects and developers, who wish to build, manage Hadoop Stack or write, build and maintain Apache Hadoop jobs.
Course Prerequisites: The participants should have basic understanding or knowledge of java and linux.
Course Content:
What is Big Data & Why Hadoop?
• Big Data Characteristics, Challenges with traditional system
Hadoop Overview & it’s Ecosystem
• Anatomy of Hadoop Cluster, Installing and Configuring Hadoop
• Hands-On Exercise
HDFS – Hadoop Distributed File System
• HDFS Architecture, Name Nodes, Data Nodes and Secondary Name Node
• Hands-On Exercise
Map Reduce Anatomy
• How Map Reduce Works?
• The Mapper & Reducer, , Data Type, Input& Output Formats
Developing Map Reduce Programs
• Setting up Eclipse Development Environment, Creating Map Reduce Projects, Debugging and Unit Testing
• Developing a map reduce algorithm on real world scenario
• Hands On Exercises
Advanced Map Reduce Concepts
• Combiner, Partitioner,
Monitoring & Management of Hadoop
• Managing HDFS
• Using HDFS & Job Tracker
• Hands On Exercises
Sqoop
• Importing and Exporting data from using RDBMS
• Hands On Exercises – Import and Export
Hive
• Hive Basics, Internal & External Tables, Partitioning, Buckets
• Writing queries – Joins, Union, Dynamic partitioning, Sampling
• Hands On Exercise – Structured data analysis
Pig
• Pig Basics, Loading data files
• Writing queries – SPLIT, FILTER, JOIN, GROUP, SAMPLE, ILLUSTRATE etc.
• Hands On Exercise – Semi-structured Data Analysis
Hbase
Setting up a Hadoop Cluster
• Hands-On Session