Courses Offered: SCJP SCWCD Design patterns EJB CORE JAVA AJAX Adv. Java XML STRUTS Web services SPRING HIBERNATE  

       

HADOOP Course Details
 

Subscribe and Access : 4500+ FREE Videos and 21+ Subjects Like CRT, SoftSkills, JAVA, Hadoop, Microsoft .NET, Testing Tools etc..

Batch Date: Mar 1st @ 8:00AM

Faculty: Mr. Suresh

Duration: 25 Days

Fee: 6000/- INR+ Reg Fee 100/-INR (Last Batch for Discounted Price)

Location : Maitrivanam, Hyderabad.

Venue :
DURGA SOFTWARE SOLUTIONS at Maitrivanam
Plot No : 202,
IInd Floor ,
HUDA Maitrivanam,
Ameerpet, Hyderabad-500038
Ph.No : 09246212143.

* Complete Material Will be Provided by Real Time Expert

Syllabus:

  1. What is big data?
    1. Big data challenges?
    2. How hadoop is related to big data?
    3. Problems with storing/processing of big data
    4. Working with traditional large scale systems
  2. What is hadoop ?
    1. Hadoop core components – HDFS & MR
    2. Hadoop eco system – other tools
    3. Hadoop distributions and differences: Cloudera, Horton works, MapR
    4. Real time scenarios of hadoop with various use cases.

HDFS (Hadoop Distributed File System)

  1. DFS vs HDFS  and Cluster vs Hadoop Clusters
  2. Features of HDFS
  3. HDFS Architecture
  4. HDFS storage
    1. blocks, Configuring blocks , default vs custom block sizes
    2. HDFS architecture
    3. Replication in HDFS
  5. Fail over mechanism
  6. Custom replication and configuring replication factors
  7. Daemons of Hadoop 1.x :
    1. NameNode and functionality
    2. DataNode and functionality
    3. Secondary Name Node and functionality
    4. Job Tracker and functionality
    5. Task Tracker
  8. Daemons of Hadoop 2.x :
  9. Name Node, Data Node, Secondary Name Node, Resource Manager, Node Manager
  10. Hadoop cluster modes
  11.  Single Node vs multi node
  12. HDFS federation
  13.  High availability

MAP REDUCE

    • Map Reduce life cycle
    • Communication mechanism of processing daemons
    • Input format and Record reader classes
    • Success case vs Failure case scenarios
    • Retry mechanism in Map Reduce
    • Map Reduce programming
    • Different phases of Map Reduce algorithm
    • Different data types in Map Reduce
    • Primitive data types Vs Map Reduce data types
    • How to write map reduce programs
    • Driver Code
    • Importance of driver code in a Map Reduce program
    • How to identify the driver code in Map Reduce program
    • Different sections of driver code
    • Mapper Code
    • Importance of Mapper Phase in Map Reduce
    • How to write a Mapper class,Methods in Mapper Class
    • Reducer Code
    • Importance of Reduce Phase in Map Reduce
    • How to write a Reducer class,Methods in Reducer Class
    • Input split
    • Need of input split in Map reduce
    • Input Split size vs  block size
    • Input split vs mappers
    • Identity Mapper & Identity Reducer
    • Input format’s in Map Reduce
    • Text input format
    • Key value text input format
    • Sequence file input format
    • How to use the specific input format in Map Reduce
    • Custom input formats and its record readers
    • Output format’s in Map Reduce
    • Text output format
    • Key value text output format
    • Sequence file output format
    • How to use the specific output format in Map Reduce
    • Custom output formats and its record writers
    • Map Reduce API
    • New API vs Deprecated API
    • Combiner in Map Reduce
    • Usage of combiner class in map reduce
    1. Performance trade-offs
    • Partitioner in map reduce
    • Importance of partitionerclass in map reduce
    • Writing custom partitioners
    • Compression techniques in map reduce
    • Importance of compression in map reduce
    • What is CODEC
    • Compression types
    1. GZipCodec
    2. BZip and BZip2 Codec
    3. LZOCodec
    4. Snappy Codec
    • map reduce streaming
    • data localization
    • secondary sorting using  map reduce
    • enable and disable these techniques for all the job
    • enable and disable these techniques for particular job
    1. join in Map Reduce
    2. map side vs reduce side join
    3. performance trade off
    4. distributed cache
    5. counters
    6. map reduce schedulers
    7. Debugging map reduce jobs
    8. Chain mappers and reducers
    9. Setting up to no of reducers

    Apache pig

      • Introduction to pig
      • Introduction to pig
      • Installing and running pig
      • Pig Latin scripts
      • Pig console: grunt shell
      • Data types
      • Writing evaluation
      • Filter
      • Load and store functions
      • Relational operators in pig
      • COGROUP
      • CROSS
      • DISTINCT
      • FILTER
      • FOREACH
      • GROUP
      • JOIN(INNER)
      • JOIN(OUTER)
      • LIMIT
      • LOAD
      • ORDER
      • SAMPLE
      • SPILT
      • STORE
      • UNION
      • Diagnostic operators in pig
      • describe
      • dump
      • explain
      • illustrate
      • eval functions in pig
      • AVG
      • CONCAT
      • COUNT
      • DIFF
      • IF EMPTY
      • MAX
      • MIN
      • SIZE
      • SUM
      • TOKENIZE
      • MR Vs Pig
      • Different mode of execution
      • Comparison  with RDBMS(SQL)
      • Pig User Defined Functions(UDF)
      • Need of using UDF
      • How to use UDFs
      • REGISTRER key word

    HIVE

    1. Hive introduction
    2. comparison with traditional database
    3. need of apache hive in hadoop
    4. Sql Vs Hive QL
    5. map reduce and local mode
    6. hive architecture
    7. driver
    8. complier
    9. executor(semantic analyser)
    10. Meta store in hive
    11. Importance of hive meta store
    12. External meta store configuration
    13. Communication mechanism with meta store
    14. Hive query language(Hive QL)
    15. HiveQL: data types
    16. Operators and functions
    17. Hive tables(managed tables and external tables)
    18. HiveQL data manipulations
    19. Loading data in tables
    20. Exporting data
    21. Different types of joins
    22. Hive scripting
    23. Indexing
    Views
    1. Appending data into existing hive table
    2. Data Slicing  mechanisms
    3. Partitions in hive
    4. Buckets in hive
    5. Partitioning Vs Bucketing
    6. Real  time use cases
    7. User defined functions(UDF’s) in Hive
    8. UDFs
    9. How to write UDF’s
    10. Importance of udf’s
    11. UDAFs
    12. How to use the UDAF’s
    13. Importance of UDAF’s
    14. UDTFs
    15. How to use the udtf’s
    16. Importance of udtf’s
    17. Need to UDFs in HIVE
    18. Hive Serialize / Deserializer – serde
    19. Hive – HBASE Integration
    20. How to store xml and json data in hive using serde

    HBASE

    1. Hbase introduction
    2. HDFS Vs hbase
    3. Hbaseusecases
    4. Hbase basic
    5. Column families
    6. Htable
    Hbase architecture
    1. Clients
    2. Rest
    3. Thrift
    4. Java based
    5. Avro
    6. Hbase admin
    7. Schema definition
    8. Basic crud operation
    9. Map reduce integration
    10. Schema design
    11. Advanced indexing
    12. Hbase tables
    13. Hbase storage handles
    14. Hbase usage
    15. Key design
    16. Bloom filters
    17. Versioning
    18. Coprocessor
    19. Filters

    SQOOP

    1. Introduction to sqoop
    2. Mysql client and server installation
    3. How to connect to relational database using sqoop
    4. Different sqoop commands
    5. Different flavours of imports
    6. Importing from RDBMS to HDFS
    7. Importing from RDBMS to hive
    8. Importing from RDBMS to hbase
    9. Different flavours of Export
    10. Exporting from HDFS to RDBMS
    Exporting from hbase to RDBMS
    1. Exporting from hive to RDBMS
    2. SQOOP jobs
      1. Creating
      2. Deleting
      3. Executing

    FLUME

    1. Flume introduction
    2. Flume architecture
    3. Flume master
    4. collector and flume agent
    5. Flume configurations
    6. Review of API
    7. Basic operations on flume
    8. Real time use case Executions

    OOZIE

      • OOZIE introduction
      • OOZIE architecture
      • OOZIE Execution
      • Workflow.xml
      • Coordinator.xml
      • Job coordinator. Properties
      • OOZIE as a scheduler
      • OOZIE as a workflow designer

    YARN(YET ANOTHER RESOURCE NEGOTIAOR)

    1. What is yarn?
    2. Yarn architecture
    3. Resource manager
    4. Application master
    5. Node manager
    6. When should we go ahead with yarn
    7. Classic map reduce vs yarn map reduce
    8. Different configuration files for yarn
    9. Schedulers: fair and capacity
    10. Data locality

    IMPALA

    1. What is impala?
    2. How can we use impala for query processing?
    3. When should we go ahead with impala
    4. Hive vs impala
    Real time use case with impala

    MANGO DB (AS PART OF NOSQL DATABASES)

    1. Need of NOSQL databases
    2. relational vs non-relational databases
    3. introduction to Mango DB
    4. features of Mango DB
    5. installation of Mango DB
    6. Mango DB basic operation

     

    SPARK AND SCALA

    1. introduction to spark and Scala
    2. spark architecture
    3. review of java API
    4. spark integration with hadoop
    5. spark sql
    6. spark streaming
    7. real time use case using spark

    Hadoop ‘R’

    1. Introduction of Hadoop ‘R’
    2. Basic Operations

    Hadoop Administration

    1. Hadoop single node cluster setup
    2. Operating system installation
    3. Jdk installation
    4. SSH configuration
    5. Dedicated group and user creation
    6. Hadoop installation
    7. Different configuration file setting
    8. Name node format
    9. Starting the hadoop daemons
    10. PIG installation (local mode, cluster mode)
    11. SQOOP installation
    12. Sqoop installation with mysql client
    13. Hive installation
    14. Hbase installation (local mode and clustered mode)
    15. OOZIE installation
    16. Mongo DB installation

    Course Highlights:

    1. Dealing with real time scenarios and live examples
    2. Course curriculum is designed and explained in a standard, where you can clear any Hadoop certification easily.
    3. Delivering Real time Proof of Concept’s (POC) and working structure of real time projects.
    4. Providing Top 100 FAQ’s in Hadoop Interviews.
    5. Both Soft copy and Hard copy of Hadoop material will be given
    6. Latest updates, discussions on data analytics, new trends in Hadoop technology and its stack.
    7. Conducting online/offline Exams for students to validate themselves.
    Recorded video lectures and Academic projects will be provided at nominal cost for interested students.