Subcribe and Access : 5200+ FREE Videos and 21+ Subjects Like CRT, SoftSkills, JAVA, Hadoop, Microsoft .NET, Testing Tools etc..
Batch
Date: Nov
30th @ 10:00AM
Faculty: Mr. Nageswar Rao
Duration: 60 to 70 Days
Venue
:
DURGA SOFTWARE SOLUTIONS at Maitrivanam
Plot No : 202,
IInd Floor ,
HUDA Maitrivanam,
Ameerpet, Hyderabad-500038.
Ph.No: +91 - 9246212143, 80 96 96 96 96
Syllabus:
HADOOP
1. Hadoop Roles
- Hadoop Developer
- Hadoop Admin
- Data Analyst
2. Big Data
- What is Big Data?
- What Comes Under Big Data?
- 3 V's of Bigdat
3. Types of Data
- Structured data
- Semi Structured data
- Unstructured data
4. Big Data Technologies
- Operational Big Data
- Analytical Big Data
- Operational vs. Analytical Systems
5. Big Data Challenges
- Data Storage
- Data Processing
- Commidity Hardware
- Large Files
- Backup and Recovery
- Disaster Management
- Capturing data
- Curation
- Searching
- Sharing
- Transfer
- Analysis
- Presentation
6. Bigdata Processing
- Traditional Approach
- Google’s Solution
7. What is Hadoop
- Open source
- Framework
- Massive storage.
- Processing power
8. What are the benefits of Hadoop?
- Computing power.
- Flexibility
- Fault tolerance
- Low cost.
- Scalability
9. Hadoop used for What?
- Low cost storage and active data archive
- Staging area for a data warehouse and analytics store
- Data lake.
- Sandbox
- Recommendation systems
10. History of Hadoop and Versions
- Hadoop 1.x
- Hadoop 2.x
- Hadoop 3.x
- Cloudera Hadoop
- Hadoop1.x Vs Hadoop 2.x Vs Hadoop 3.x
- Apache Hadoop Vs Cloudera Hadoop
- Cloudera Hadoop Vs Hotonworks Hadoop
11. Who is using Hadoop
12. Distributions of Hadoop
13. Hadoop Architecture
- Hadoop Distributed File System (HDFS™)
- Clustered storage
- Hadoop MapReduce/
- Clustered processing
14. File System
- Winodws
- Linux
- Distributed file system (DFS)
- Hadoop Distributed File System ( HDFS )
15. Features of HDFS
- Advantages
- Disadvantages
- factors in Hadoop
16. Hadoop Cluster Architecture
- Apache Hadoop 1.x
- Apache Hadoop 2.x
- Cloudera Hadoop
17. Distributed Data Storage
- NameNode
- HA NamedNode
- SecondaryNameNode
- DataNode
18. Distributed Data Processing
- ResourceManager
- HA ResourceManager
- NodeManager
- ApplicationMaster
- ApplicationManager
- JobTracker
- TaskTracker
19. Data Storage
- Horizontal scaling
- Vertical scaling
- Hadoop Data Storage
- Blocks in HDFS
- Size of blocks in HDFS
- Block size related to split size
- Replication factor
- Replication Startezy
20. Communication Between Hadoop Daemons ( RPC and HTTP )
21. Different modes of Hadoop
- Local (Standalone) Mode
- Pseudo--Distributed Mode
- Fully--Distributed Mode
22. Pseudo Distributed Mode (Single Node Cluster) setup
- pre--requisites in setup
- Linux Commands used in setup
- Steps in Setup
- Trouble shooting in setup
23. Hadoop Configuratios
- core--site.xml
- hdfs--site.xml
- mapred--site.xml
- yarn--site.xml
- masters
- slaves
24. Acessing HDFS
25. Hadoop Commands
- Hadoop basic commands/
- Hadoop file system and MapReduce commands
- Hadoop HDFS and MR Admin commands
26. HadoopMapReduce Programming
- Working with Java eclipse
- Development of Mapper
- Development of Reducer
- Development of Driver
- Data conversion between Java and Hadoop
27. How Does Hadoop MapReduce Work?
28. Data Flow in MapReduce application
- Input Data
- Input Format
- Mapper
- Combiner
- Partitioner
- Shuffle & Short
- Reducer
- Output Format
- Output
29. Life Cycle methods of Mapper and Reducer
30. Job submission process
- Job initialization
- Task Assignment & heartbeat
- Task Execution
- Task Runner
31. Hadoop cluster setup
- How to setup Master, Slave Nodes and client nodes and Hadoop eco--systems
- Adding nodes to cluster
- Removing nodes from the cluster
- Comssioning nodes in the cluster
- Decommisioning nodes in the cluster
- How to handle failure nodes
- How to verify dead nodes and Live nodes
32. Understanding Hadoop Clusters and the Network
- Hadoop Cluster
- Server Roles in Hadoop
- Hadoop Workflow
- Writing Files to HDFS
- Hadoop Rack Awareness
- Preparing HDFS Writes
- HDFS Write Pipeline
- HDFS Pipeline Write Success
- HDFS Multi--block Replication Pipeline
- Client Writes Span Cluster
- Name Node
- Re--replicating Missing Replicas
- Secondary Name Node
- Client Read from HDFS
- Data Node reads from HDFS
- Data processing : Map
- Data processing : Reduce/
- Unbalanced Hadoop Cluster
- Hadoop Cluster Balan
33. Handling failures in Hadoop
- NameNode failure
- SecondaryNameNode failure
- DataNode failure
34. A client reading data from HDFS
35. A client writing data to HDFS
36. MapRed Input Formats
- TextInputFormat
- KeyValueTextInputFormat
- SequenceFileInputFormat
- NlineInputFormat
- DBInputFormat
- CustomInputFormat
37. MapRed Output Formats
- TextOutputFormat
- SequenceFileOutputFormat
- NullOutputFormat
- DBOutputFormat
- CustomOutputFormat
38. Differences between a MapReduce Combiner and Reducer
39. MapReduce Partitioner Implementation
40. Compression Techniques in Hadoop
- Compression Codecs
- How to configure compression codecs
- Enabling and Disabling at job level and Global
41. Processing different formats of data
- Text data
- Excel Data
- XML Data
- PDF Files
- Image Files
42. Distributed Catche
- Advantages and disadvantages
- how to use
- when to use
43 . Hadoop Joins
- Mapside Join
- Reduceside join
44. Configuring Map and Reduce Tasks
45. MapReduce Counters
46. Hadoop Scheduling
47. Hadoop Job Priorities
48. Hadoop Queues
49. Debugging Hadoop Applications
- Local Debuggin
- Remote Debugging
50. Data Locality
51. Speculative Execution
52. MapReduce Commands
53. Hadoop Ecosystems
1) Hive
- What is Hive
- where ot use not use
- Features of Hive
- Hive Architecture
- Working of Hive
- Hive Installation
- Hive Integration with Hadoop
- Hive Qery Language ( Hive QL)
- Hive DDL and DML Operations
- Hive Internal and external metastore
- Hive with Mysql sever
- Appache Hive ( Face Book )
- Comparison of Hive and MYSQL commands.
- Data types in Hive
- Loading data into hive table senarios
- Hive Partitions
- Hive Joins
- Hive Buckeintg
- Difference between Bucketing and Partitioning.
- Hive Built--In Functions
- UDF's
- UDTF's
- UDAF's
2) Sqoop
- Requiremens
- Sqoop Installation
- Sqoop Commands
- Sqoop with mysql
- Sqoop with oracle
- Importing data into HDFS
- Exporting data into RDBMS
- Sqoop with Hive/
3) Hbase
- Installation
- Architecture
- Hbase Usage
- Hbase Clients
4) Zookeeper
- Pseudo mode Installation
- Cluster mode Installation
- Working of Zookeeper
5) Pig
- What is Pig
- Terminology used in pig
- Local Mode Installation
- Map--Reduce mode Installation
- Load Functions
- Store Functions
6) Appache Oozie
- Oozie Installation
- Oozie Workflow
- Executing and Monitoring jobs
7) Apache Flume
- Fulme setup
- Source, Stream and Sinc Configurations
- Social Network sites data gettinng HDFS
54. Hadoop 2.x version
- Local (Standalone) Mode
- Pseudo--Distributed Mode
- Fully--Distributed Mode
55. HDFS2 and YARB
- HDFS2 Architecture
- HA NamedNode
- YARN Architecture
- HA ResourceManager
- YARN Concepts in realtime
- Difference between YARN and Map--Reduce
56. Hortonworks Distributions
- Introduction to Hortonworks
- Hortonworks Installation
- Hortonworks with Hadoop
- Comparision of Hortonworks and Hadoop
57 . Clouder Distributions
- Introduction to Cloudera
- Cloudera setup
- Cloudera with Hadoop
- Difference between Cloudera and Hadoop
58 . Real time challenges in Hadoop Projects Development
Hadoop Administration
1) Hadoop Installation 1.x, 2.x and CDH5
- Local (Standalone) Mode
- Pseudo--Distributed Mode
- Fully--Distributed Mode
- How to setup Master and Slave Nodes in a cluster
- Adding nodes to cluster
- Removing nodes from the cluster
- Comssioning nodes in the cluster
- Decommisioning nodes in the cluster
- How to handle failure nodes
- How to verify dead nodes and Live nodes
2) Hive Installation
- Local mode
- Internal Derby
- Cluster mode
- Internal Derby
- External Mysql
- Hive CLI
- Hive Web UI
3) Hbase Installation
- Local Mode
- Psuedo mode
- Cluster mode
4) Zookeeper Installation
5) Sqoop Intsallation
- Sqoop installation with Mysql
- Sqoop and Hadoop integration
- Sqoop and Hive integration
6) Pig Installation
- Local mode
- MapReduce mode
7) Flume Installation
8) Oozie Installation
Pre-requisites for the Course ( Covered as part of the course )
1. Core Java Concepts
- Control Statements
- Working of Methods
- OOP's Concepts
- Constructors
- Method Over Loading
- Inheritance
- Interfaces and Abstract classes
- Strings and StringTokenizer, StringBuffer
and StringBuilder
- Exception Handling
- I/O Streams
- JDBC Concepts
- Networking Concepts
2. Linux Commands
- User level commands
- Admin level commands
- Shell scripting
3. SQL Commands and Database Admin Concepts