DURGA SOFTWARE SOLUTIONS

Courses Offered:

SCJP SCWCD Design patterns EJB CORE JAVA AJAX Adv. Java XML STRUTS Web services SPRING HIBERNATE

HADOOP Course Details

Subscribe and Access : 4500+ FREE Videos and 21+ Subjects Like CRT, SoftSkills, JAVA, Hadoop, Microsoft .NET, Testing Tools etc..

Batch Date: Mar 1st @ 8:00AM

Faculty: Mr. Suresh

Duration: 25 Days

Fee: 6000/- INR+ Reg Fee 100/-INR (Last Batch for Discounted Price)

Location : Maitrivanam, Hyderabad.

Venue :
DURGA SOFTWARE SOLUTIONS at Maitrivanam
Plot No : 202,
IInd Floor ,
HUDA Maitrivanam,
Ameerpet, Hyderabad-500038
Ph.No : 09246212143.

* Complete Material Will be Provided by Real Time Expert

Syllabus:

What is big data?

Big data challenges?

How hadoop is related to big data?

Problems with storing/processing of big data

Working with traditional large scale systems

What is hadoop ?

Hadoop core components – HDFS & MR

Hadoop eco system – other tools

Hadoop distributions and differences: Cloudera, Horton works, MapR

Real time scenarios of hadoop with various use cases.

HDFS (Hadoop Distributed File System)

DFS vs HDFS and Cluster vs Hadoop Clusters

Features of HDFS

HDFS Architecture

HDFS storage

blocks, Configuring blocks , default vs custom block sizes

HDFS architecture

Replication in HDFS

Fail over mechanism

Custom replication and configuring replication factors

Daemons of Hadoop 1.x :

NameNode and functionality

DataNode and functionality

Secondary Name Node and functionality

Job Tracker and functionality

Task Tracker

Daemons of Hadoop 2.x :

Name Node, Data Node, Secondary Name Node, Resource Manager, Node Manager

Hadoop cluster modes

Single Node vs multi node

HDFS federation

High availability

MAP REDUCE

Map Reduce life cycle

Communication mechanism of processing daemons

Input format and Record reader classes

Success case vs Failure case scenarios

Retry mechanism in Map Reduce

Map Reduce programming

Different phases of Map Reduce algorithm

Different data types in Map Reduce

Primitive data types Vs Map Reduce data types

How to write map reduce programs

Driver Code

Importance of driver code in a Map Reduce program

How to identify the driver code in Map Reduce program

Different sections of driver code

Mapper Code

Importance of Mapper Phase in Map Reduce

How to write a Mapper class,Methods in Mapper Class

Reducer Code

Importance of Reduce Phase in Map Reduce

How to write a Reducer class,Methods in Reducer Class

Input split

Need of input split in Map reduce

Input Split size vs block size

Input split vs mappers

Identity Mapper & Identity Reducer

Input format’s in Map Reduce

Text input format

Key value text input format

Sequence file input format

How to use the specific input format in Map Reduce

Custom input formats and its record readers

Output format’s in Map Reduce

Text output format

Key value text output format

Sequence file output format

How to use the specific output format in Map Reduce

Custom output formats and its record writers

Map Reduce API

New API vs Deprecated API

Combiner in Map Reduce

Usage of combiner class in map reduce

Performance trade-offs

Partitioner in map reduce

Importance of partitionerclass in map reduce

Writing custom partitioners

Compression techniques in map reduce

Importance of compression in map reduce

What is CODEC

Compression types

GZipCodec

BZip and BZip2 Codec

LZOCodec

Snappy Codec

map reduce streaming

data localization

secondary sorting using map reduce

enable and disable these techniques for all the job

enable and disable these techniques for particular job

join in Map Reduce

map side vs reduce side join

performance trade off

distributed cache

counters

map reduce schedulers

Debugging map reduce jobs

Chain mappers and reducers

Setting up to no of reducers

Apache pig

Introduction to pig

Introduction to pig

Installing and running pig

Pig Latin scripts

Pig console: grunt shell

Data types

Writing evaluation

Filter

Load and store functions

Relational operators in pig

COGROUP

CROSS

DISTINCT

FILTER

FOREACH

GROUP

JOIN(INNER)

JOIN(OUTER)

LIMIT

LOAD

ORDER

SAMPLE

SPILT

STORE

UNION

Diagnostic operators in pig

describe

dump

explain

illustrate

eval functions in pig

AVG

CONCAT

COUNT

DIFF

IF EMPTY

MAX

MIN

SIZE

SUM

TOKENIZE

MR Vs Pig

Different mode of execution

Comparison with RDBMS(SQL)

Pig User Defined Functions(UDF)

Need of using UDF

How to use UDFs

REGISTRER key word

HIVE

Hive introduction

comparison with traditional database

need of apache hive in hadoop

Sql Vs Hive QL

map reduce and local mode

hive architecture

driver

complier

executor(semantic analyser)

Meta store in hive

Importance of hive meta store

External meta store configuration

Communication mechanism with meta store

Hive query language(Hive QL)

HiveQL: data types

Operators and functions

Hive tables(managed tables and external tables)

HiveQL data manipulations

Loading data in tables

Exporting data

Different types of joins

Hive scripting

Indexing

Views

Appending data into existing hive table

Data Slicing mechanisms

Partitions in hive

Buckets in hive

Partitioning Vs Bucketing

Real time use cases

User defined functions(UDF’s) in Hive

UDFs

How to write UDF’s

Importance of udf’s

UDAFs

How to use the UDAF’s

Importance of UDAF’s

UDTFs

How to use the udtf’s

Importance of udtf’s

Need to UDFs in HIVE

Hive Serialize / Deserializer – serde

Hive – HBASE Integration

How to store xml and json data in hive using serde

HBASE

Hbase introduction

HDFS Vs hbase

Hbaseusecases

Hbase basic

Column families

Htable

Hbase architecture

Clients

Rest

Thrift

Java based

Avro

Hbase admin

Schema definition

Basic crud operation

Map reduce integration

Schema design

Advanced indexing

Hbase tables

Hbase storage handles

Hbase usage

Key design

Bloom filters

Versioning

Coprocessor

Filters

SQOOP

Introduction to sqoop

Mysql client and server installation

How to connect to relational database using sqoop

Different sqoop commands

Different flavours of imports

Importing from RDBMS to HDFS

Importing from RDBMS to hive

Importing from RDBMS to hbase

Different flavours of Export

Exporting from HDFS to RDBMS

Exporting from hbase to RDBMS

Exporting from hive to RDBMS

SQOOP jobs

Creating

Deleting

Executing

FLUME

Flume introduction

Flume architecture

Flume master

collector and flume agent

Flume configurations

Review of API

Basic operations on flume

Real time use case Executions

OOZIE

OOZIE introduction

OOZIE architecture

OOZIE Execution

Workflow.xml

Coordinator.xml

Job coordinator. Properties

OOZIE as a scheduler

OOZIE as a workflow designer

YARN(YET ANOTHER RESOURCE NEGOTIAOR)

What is yarn?

Yarn architecture

Resource manager

Application master

Node manager

When should we go ahead with yarn

Classic map reduce vs yarn map reduce

Different configuration files for yarn

Schedulers: fair and capacity

Data locality

IMPALA

What is impala?

How can we use impala for query processing?

When should we go ahead with impala

Hive vs impala

Real time use case with impala
MANGO DB (AS PART OF NOSQL DATABASES)

Need of NOSQL databases

relational vs non-relational databases

introduction to Mango DB

features of Mango DB

installation of Mango DB

Mango DB basic operation

SPARK AND SCALA

introduction to spark and Scala

spark architecture

review of java API

spark integration with hadoop

spark sql

spark streaming

real time use case using spark

Hadoop ‘R’

Introduction of Hadoop ‘R’

Basic Operations

Hadoop Administration

Hadoop single node cluster setup

Operating system installation

Jdk installation

SSH configuration

Dedicated group and user creation

Hadoop installation

Different configuration file setting

Name node format

Starting the hadoop daemons

PIG installation (local mode, cluster mode)

SQOOP installation

Sqoop installation with mysql client

Hive installation

Hbase installation (local mode and clustered mode)

OOZIE installation

Mongo DB installation

Course Highlights:

Dealing with real time scenarios and live examples

Course curriculum is designed and explained in a standard, where you can clear any Hadoop certification easily.

Delivering Real time Proof of Concept’s (POC) and working structure of real time projects.

Providing Top 100 FAQ’s in Hadoop Interviews.

Both Soft copy and Hard copy of Hadoop material will be given

Latest updates, discussions on data analytics, new trends in Hadoop technology and its stack.

Conducting online/offline Exams for students to validate themselves.

Recorded video lectures and Academic projects will be provided at nominal cost for interested students.