Courses Offered: SCJP SCWCD Design patterns EJB CORE JAVA AJAX Adv. Java XML STRUTS Web services SPRING HIBERNATE  

       

HADOOP Course Details
 

Subcribe and Access : 5200+ FREE Videos and 21+ Subjects Like CRT, SoftSkills, JAVA, Hadoop, Microsoft .NET, Testing Tools etc..

Batch Date: Nov 30th @ 10:00AM

Faculty: Mr. Nageswar Rao

Duration: 60 to 70 Days

Venue :
DURGA SOFTWARE SOLUTIONS at Maitrivanam
Plot No : 202, IInd Floor ,
HUDA Maitrivanam,
Ameerpet, Hyderabad-500038.

Ph.No: +91 - 9246212143, 80 96 96 96 96



Syllabus:

HADOOP

1. Hadoop Roles

  • Hadoop Developer
  • Hadoop Admin
  • Data Analyst

2. Big Data

  • What is Big Data?
  • What Comes Under Big Data?
  • 3 V's of Bigdat

3. Types of Data

  • Structured data
  • Semi Structured data
  • Unstructured data

4. Big Data Technologies

  • Operational Big Data
  • Analytical Big Data
  • Operational vs. Analytical Systems

5. Big Data Challenges

  • Data Storage
  • Data Processing
  • Commidity Hardware
  • Large Files
  • Backup and Recovery
  • Disaster Management
  • Capturing data
  • Curation
  • Searching
  • Sharing
  • Transfer
  • Analysis
  • Presentation

6. Bigdata Processing

  • Traditional Approach
  • Google’s Solution

7. What is Hadoop

  • Open source
  • Framework
  • Massive storage.
  • Processing power

8. What are the benefits of Hadoop?

  • Computing power.
  • Flexibility
  • Fault tolerance
  • Low cost.
  • Scalability

9. Hadoop used for What?

  • Low cost storage and active data archive
  • Staging area for a data warehouse and analytics store
  • Data lake.
  • Sandbox
  • Recommendation systems

10. History of Hadoop and Versions

  • Hadoop 1.x
  • Hadoop 2.x
  • Hadoop 3.x
  • Cloudera Hadoop
  • Hadoop1.x Vs Hadoop 2.x Vs Hadoop 3.x
  • Apache Hadoop Vs Cloudera Hadoop
  • Cloudera Hadoop Vs Hotonworks Hadoop

11. Who is using Hadoop

12. Distributions of Hadoop

13. Hadoop Architecture

  • Hadoop Distributed File System (HDFS™)
  • Clustered storage
  • Hadoop MapReduce/
  • Clustered processing

14. File System

  • Winodws
  • Linux
  • Distributed file system (DFS)
  • Hadoop Distributed File System ( HDFS )

15. Features of HDFS

  • Advantages
  • Disadvantages
  • factors in Hadoop

16. Hadoop Cluster Architecture

  • Apache Hadoop 1.x
  • Apache Hadoop 2.x
  • Cloudera Hadoop

17. Distributed Data Storage

  • NameNode
  • HA NamedNode
  • SecondaryNameNode
  • DataNode

18. Distributed Data Processing

  • ResourceManager
  • HA ResourceManager
  • NodeManager
  • ApplicationMaster
  • ApplicationManager
  • JobTracker
  • TaskTracker

19. Data Storage

  • Horizontal scaling
  • Vertical scaling
  • Hadoop Data Storage
  • Blocks in HDFS
  • Size of blocks in HDFS
  • Block size related to split size
  • Replication factor
  • Replication Startezy

20. Communication Between Hadoop Daemons ( RPC and HTTP )

21. Different modes of Hadoop

  • Local (Standalone) Mode
  • Pseudo--Distributed Mode
  • Fully--Distributed Mode

22. Pseudo Distributed Mode (Single Node Cluster) setup

  • pre--requisites in setup
  • Linux Commands used in setup
  • Steps in Setup
  • Trouble shooting in setup

23. Hadoop Configuratios

  • core--site.xml
  • hdfs--site.xml
  • mapred--site.xml
  • yarn--site.xml
  • masters
  • slaves

24. Acessing HDFS

  • CLI
  • Web UI

25. Hadoop Commands

  • Hadoop basic commands/
  • Hadoop file system and MapReduce commands
  • Hadoop HDFS and MR Admin commands

26. HadoopMapReduce Programming

  • Working with Java eclipse
  • Development of Mapper
  • Development of Reducer
  • Development of Driver
  • Data conversion between Java and Hadoop

27. How Does Hadoop MapReduce Work?

28. Data Flow in MapReduce application

  • Input Data
  • Input Format
  • Mapper
  • Combiner
  • Partitioner
  • Shuffle & Short
  • Reducer
  • Output Format
  • Output

29. Life Cycle methods of Mapper and Reducer

30. Job submission process

  • Job initialization
  • Task Assignment & heartbeat
  • Task Execution
  • Task Runner

31. Hadoop cluster setup

  • How to setup Master, Slave Nodes and client nodes and Hadoop eco--systems
  • Adding nodes to cluster
  • Removing nodes from the cluster
  • Comssioning nodes in the cluster
  • Decommisioning nodes in the cluster
  • How to handle failure nodes
  • How to verify dead nodes and Live nodes

32. Understanding Hadoop Clusters and the Network

  • Hadoop Cluster
  • Server Roles in Hadoop
  • Hadoop Workflow
  • Writing Files to HDFS
  • Hadoop Rack Awareness
  • Preparing HDFS Writes
  • HDFS Write Pipeline
  • HDFS Pipeline Write Success
  • HDFS Multi--block Replication Pipeline
  • Client Writes Span Cluster
  • Name Node
  • Re--replicating Missing Replicas
  • Secondary Name Node
  • Client Read from HDFS
  • Data Node reads from HDFS
  • Data processing : Map
  • Data processing : Reduce/
  • Unbalanced Hadoop Cluster
  • Hadoop Cluster Balan

33. Handling failures in Hadoop

  • NameNode failure
  • SecondaryNameNode failure
  • DataNode failure

34. A client reading data from HDFS

35. A client writing data to HDFS

36. MapRed Input Formats

  • TextInputFormat
  • KeyValueTextInputFormat
  • SequenceFileInputFormat
  • NlineInputFormat
  • DBInputFormat
  • CustomInputFormat

37. MapRed Output Formats

  • TextOutputFormat
  • SequenceFileOutputFormat
  • NullOutputFormat
  • DBOutputFormat
  • CustomOutputFormat

38. Differences between a MapReduce Combiner and Reducer

39. MapReduce Partitioner Implementation

40. Compression Techniques in Hadoop

  • Compression Codecs
  • How to configure compression codecs
  • Enabling and Disabling at job level and Global

41. Processing different formats of data

  • Text data
  • Excel Data
  • XML Data
  • PDF Files
  • Image Files

42. Distributed Catche

  • Advantages and disadvantages
  • how to use
  • when to use

43 . Hadoop Joins

  • Mapside Join
  • Reduceside join

44. Configuring Map and Reduce Tasks

45. MapReduce Counters

46. Hadoop Scheduling

  • FIFO
  • FAIR
  • CAPACITY

47. Hadoop Job Priorities

48. Hadoop Queues

49. Debugging Hadoop Applications

  • Local Debuggin
  • Remote Debugging

50. Data Locality

51. Speculative Execution

52. MapReduce Commands

53. Hadoop Ecosystems

1) Hive

  • What is Hive
  • where ot use not use
  • Features of Hive
  • Hive Architecture
  • Working of Hive
  • Hive Installation
  • Hive Integration with Hadoop
  • Hive Qery Language ( Hive QL)
  • Hive DDL and DML Operations
  • Hive Internal and external metastore
  • Hive with Mysql sever
  • Appache Hive ( Face Book )
  • Comparison of Hive and MYSQL commands.
  • Data types in Hive
  • Loading data into hive table senarios
  • Hive Partitions
  • Hive Joins
  • Hive Buckeintg
  • Difference between Bucketing and Partitioning.
  • Hive Built--In Functions
  • UDF's
  • UDTF's
  • UDAF's

2) Sqoop

  • Requiremens
  • Sqoop Installation
  • Sqoop Commands
  • Sqoop with mysql
  • Sqoop with oracle
  • Importing data into HDFS
  • Exporting data into RDBMS
  • Sqoop with Hive/

3) Hbase

  • Installation
  • Architecture
  • Hbase Usage
  • Hbase Clients

4) Zookeeper

  • Pseudo mode Installation
  • Cluster mode Installation
  • Working of Zookeeper

5) Pig

  • What is Pig
  • Terminology used in pig
  • Local Mode Installation
  • Map--Reduce mode Installation
  • Load Functions
  • Store Functions

6) Appache Oozie

  • Oozie Installation
  • Oozie Workflow
  • Executing and Monitoring jobs

7) Apache Flume

  • Fulme setup
  • Source, Stream and Sinc Configurations
  • Social Network sites data gettinng HDFS

54. Hadoop 2.x version

  • Local (Standalone) Mode
  • Pseudo--Distributed Mode
  • Fully--Distributed Mode

55. HDFS2 and YARB

  • HDFS2 Architecture
  • HA NamedNode
  • YARN Architecture
  • HA ResourceManager
  • YARN Concepts in realtime
  • Difference between YARN and Map--Reduce

56. Hortonworks Distributions

  • Introduction to Hortonworks
  • Hortonworks Installation
  • Hortonworks with Hadoop
  • Comparision of Hortonworks and Hadoop

57 . Clouder Distributions

  • Introduction to Cloudera
  • Cloudera setup
  • Cloudera with Hadoop
  • Difference between Cloudera and Hadoop

58 . Real time challenges in Hadoop Projects Development

Hadoop Administration

1) Hadoop Installation 1.x, 2.x and CDH5

  • Local (Standalone) Mode
  • Pseudo--Distributed Mode
  • Fully--Distributed Mode
  • How to setup Master and Slave Nodes in a cluster
  • Adding nodes to cluster
  • Removing nodes from the cluster
  • Comssioning nodes in the cluster
  • Decommisioning nodes in the cluster
  • How to handle failure nodes
  • How to verify dead nodes and Live nodes

2) Hive Installation

  • Local mode
  • Internal Derby
  • Cluster mode
  • Internal Derby
  • External Mysql
  • Hive CLI
  • Hive Web UI

3) Hbase Installation

  • Local Mode
  • Psuedo mode
  • Cluster mode

4) Zookeeper Installation

  • Local mode
  • Cluster Mode

5) Sqoop Intsallation

  • Sqoop installation with Mysql
  • Sqoop and Hadoop integration
  • Sqoop and Hive integration

6) Pig Installation

  • Local mode
  • MapReduce mode

7) Flume Installation

8) Oozie Installation

Pre-requisites for the Course ( Covered as part of the course )

1. Core Java Concepts

  • Control Statements
  • Working of Methods
  • OOP's Concepts
  • Constructors
  • Method Over Loading
  • Inheritance
  • Interfaces and Abstract classes
  • Strings and StringTokenizer, StringBuffer and StringBuilder
  • Exception Handling
  • I/O Streams
  • JDBC Concepts
  • Networking Concepts

2. Linux Commands

  • User level commands
  • Admin level commands
  • Shell scripting

3. SQL Commands and Database Admin Concepts