Courses Offered: SCJP SCWCD Design patterns EJB CORE JAVA AJAX Adv. Java XML STRUTS Web services SPRING HIBERNATE  
     

HADOOP Course Details


Batch Date: Mar 14th @ 11:00AM

Faculty: Mr. Anil Kumar    

Location: Marthahalli, Bangalore

Address :
#8,Kriti Building,Behind Staples show room,
SGR Dental college road,
Munnekolala,Marthahalli.
Phone Number: 9686031777,7204746644

Syllabus:


HADOOP DEVOLOPMENT

• Hadoop and HDFS architecture

Hadoop Architecture and Eco System

Understanding of Distribution system & parallel computing.

HDFS daemons : Namenode, Secondary Namenode, and Datanode

MapReduce daemons : JobTracker and TaskTracker

Block Replacement,Data Integrity, Re-balancer

HDFS user/admin commands.

Anatomy of a Hadoop Cluster

• Setting up Hadoop cluster

Install and configure Apache Hadoop

Make a Pseudo distributed Hadoop cluster on a single laptop/desktop

Monitoring the cluster using UI

• MapReduce Programming

MapReduce framework and architecture

Hadoop Data Types

Developing MapReduce Programs in

Local Mode

Pseudo-distributed Mode

Fully distributed mode

Writing MapReduce Programs

Examining MapReduce Programming

ToolRunner

Basic API Concepts (Driver code, Mapper, Reducer)

• Delving Deeper Into the Hadoop API

The configure and close Methods

Input and Output Formatters

Text Format

KeyValue Format

Nline Format

SequenceFile Format

Partitioners

• Tuning for Performance

Reducing network traffic with combiner

Reducing the amount of input data

Running with speculative execution

• Advanced MapReduce Programming

A Recap of the MapReduce Flow

Custom Writables and WritableComparables

Map-Side Joins

Reduce-Side Joins

Using The Distributed Cache

• Monitoring and debugging on a Production Cluster

Counters

Skipping Bad Records

Rerunning failed tasks with Isolation Runner

Schedulers(FIFO, Capacity and Fair)

• YARN Introduction & Architecture

• Pig - ETL
Introduction, Pig Vs Hive,

Pig Vs MapReduce and SQL

Pig's Data Model

Pig Architecture

Pig Latin, Transformations

Installing and Running Pig in Local & Distributed modes

Advanced Pig concepts, Debugging

Hands-on Exercise

• Hive – Dataware housing platform

Architecture of Hive

Hive Services, Clients, Meta-store

Hive Data Model and File Formats

Hive Query Language

DDL in Hive

Joins, Unions, Indexing, Views

Statistics & Archiving with Hive

Hive Partitions, Buckets

Hive UDF,UDAF,UDTF

Hive SerDe properties

Hive Optimizations and best practices

Hands-on Exercise

• Hbase – NOSQL Database

Hbase Overview & Architecture

Hbase Installation

Usage Scenario of Hbase, CRUD

HBase DataModel

Table and Row

Column Family & Column Qualifier

Cell and its Versioning

Regions and Region Server

Hbase operations (Get/Scan, Put, Delete..)

Hbase Admin - Create database, Develop and

run sample applications

Hbase Clients

Thrift

Java API

REST

MapReduce & Hive Integration with Hbase

• SQOOP

Overview on Sqoop import/export

Install and configure Sqoop on cluster

MySQL Installation and connection

Sqoop commands

Various Options to Import Data

Table Imports

Filtering Imports

Hive Imports

• Flume

Introduction and Architecture

Install and configure Flume

Flume Components

Flume Events

Hands-on Exercise

Gathering Twitter data using Flume