Courses Offered: SCJP SCWCD Design patterns EJB CORE JAVA AJAX Adv. Java XML STRUTS Web services SPRING HIBERNATE  

       

HADOOP + PYSPARK + PYTHON + LINUX Course Details
 

Subcribe and Access : 5200+ FREE Videos and 21+ Subjects Like CRT, SoftSkills, JAVA, Hadoop, Microsoft .NET, Testing Tools etc..

Batch Date: Dec 21st & 22nd @5:00PM

Faculty: Mr. N. Vijay Sunder Sagar (20+ Yrs Of Exp,..)

Duration: 15 Weekends Batch

Venue :
DURGA SOFTWARE SOLUTIONS,
Flat No : 202, 2nd Floor,
HUDA Maitrivanam,
Ameerpet, Hyderabad - 500038

Ph. No: +91 - 9246212143, 80 96 96 96 96



Syllabus:

BIG DATA HADOOP

I: INTRODUCTION

  • What is Big Data?
  • What is Hadoop?
  • Need of Hadoop
  • Sources and Types of Data
  • Comparison with Other Technologies
  • Challenges with Big Data
    • i. Storage
    • ii. Processing
  • RDBMS vs Hadoop
  • Advantages of Hadoop
  • Hadoop Echo System components

II: HDFS (Hadoop Distributed File System)

  • Features of HDFS
  • Name node ,Data node ,Blocks
  • Configuring Block size,
  • HDFS Architecture ( 5 Daemons)
    • i. Name Node
    • ii. Data Node
    • iii. Secondary Name node
    • iv. Job Tracker
    • v. Task Tracker
  • Metadata management
  • Storage and processing
  • Replication in Hadoop
  • Configuring Custom Replication
  • Fault Tolerance in Hadoop
  • HDFS Commands

III: MAP REDUCE

  • Map Reduce Architecture
  • Processing Daemons of Hadoop
    • Job Tracker (Roles and Responsibilities)
    • Task Tracker(Roles and Responsibilities)
  • Phases of Map Reduce
    • i) Mapper phase
    • ii) Reducer phase
  • Input split
  • Input split vs Block size
  • Partitioner in Map Reduce
  • Groupings and Aggregations
  • Data Types in Map Reduce
  • Map Reduce Programming Model
    • Driver Code
    • Mapper Code
    • Reducer Code
  • Programming examples
  • File input formats
  • File output formats
  • Merging in Map Reduce
  • Speculative Execution Model
  • Speculative Job

IV: SQOOP (SQL + HADOOP)

  • Introduction to Sqoop
  • SQOOP Import
  • SQOOP Export
  • Importing Data From RDBMS to HDFS
  • Importing Data From RDBMS to HIVE
  • Importing Data From RDBMS to HBASE
  • Exporting From HASE to RDBMS
  • Exporting From HBASE to RDBMS
  • Exporting From HIVE to RDBMS
  • Exporting From HDFS to RDBMS
  • Transformations While Importing / Exporting
  • Filtering data while importing
  • Vertical and Horizontal merging while import
  • Working with delimiters while importing
  • Groupings and Aggregations while import
  • Incremental import
  • Examples and operations
  • Defining SQOOP Jobs

V: YARN

  • Introduction
  • Speculative Execution ,Speculative job and
  • Speculative Task.
  • Comparision of Hadoop1.xx with Hadoop2.xx
  • Comparision with previous versions
  • YARN Architecture Componets
    • i. Resource Manager
    • ii. Application Master
    • iii. Node Manager
    • iv. Application Manager
    • v. Resource Scheduler
    • vi. Job History Server
    • vii. Container

VI: NOSQL

  • What is “Not only SQL”
  • NOSQL Advantages
  • What is problem with RDBMS for Large
  • Data Scaling Systems
  • Types of NOSQL & Purposes
  • Key Value Store
  • Columer Store
  • Document Store
  • Graph Store
  • Introduction to cassandra – NOSQL Database
  • Introduction to MongoDB and CouchDB Database
  • Intergration of NOSQL Databases with Hadoop

VII: HBASE

  • Introduction to big table
  • What is NOSQL and colummer store Database
  • HBASE Introduction
  • Hbase use cases
  • Hbase basics
  • Column families
  • Scans
  • Hbase Architecture
  • Map Reduce Over Hbase
  • Hbase data Modeling
  • Hbase Schema design
  • Hbase CRUD operators
  • Hive & Hbaseinteragation
  • Hbase storage handlers

VIII: HIVE

  • Introduction
  • Hive Architecture
  • Hive Metastore
  • Hive Query Launguage
  • Difference between HQL and SQL
  • Hive Built in Functions
  • Loading Data From Local Files To Hive Tables
  • Loading Data From Hdfs Files To Hive Tables
  • Tables Types
  • Inner Tables
  • External Tables
  • Hive Working with unstructured data
  • Hive Working With Xml Data
  • Hive Working With Json Data
  • Hive Working With Urls And Weblog Data
  • Hive Unions
  • Hive Joins
  • Multi Table / File Inserts
  • Inserting Into Local Files
  • Inserting Into Hdfs Files
  • Hive UDF (user defined functions)
  • Hive UDAF (user defined Aggregated functions)
  • Hive UDTF (user defined table Generated functions
  • Partitioned Tables
  • Non – Partitioned Tables
  • Multi-column Partitioning
  • Dynamic Partitions In Hive
  • Performance Tuning mechanism
  • Bucketing in hive
  • Indexing in Hive
  • Hive Examples
  • Hive & Hbase Integration

 

PYSPARK

I ) PYSPARK INTRODUCTION

  • What is Apache Spark?
  • Why Pyspark?
  • Need for pyspark
  • spark Python Vs Scala
  • pyspark features
  • Real-life usage of PySpark
  • PySpark Web/Application
  • PySpark - SparkSession
  • PySpark – SparkContext
  • PySpark – RDD
  • PySpark – Parallelize
  • PySpark – repartition() vs coalesce()
  • PySpark – Broadcast Variables
  • PySpark – Accumulator

II) PYSPARK - RDD COMPUTATION

  • Operations on a RDD
  • Direct Acyclic Graph (DAG)
  • RDD Actions and Transformations
  • RDD computation
  • Steps in RDD computation
  • RDD persistence
  • Persistence features

II) PERSISTENCE Options:

  • 1) MEMORY_ONLY
  • 2) MEMORY_SER_ONLY
  • 3) DISK_ONLY
  • 4) DISK_SER_ONLY
  • 5) MEMORY_AND_DISK_ONLY

III) PYSPARK - CORE COMPUTING

  • Fault Tolerence model in spark
  • Different ways of creating a RDD
  • Word Count Example
  • Creating spark objects(RDDs) from Scala Objects(lists).
  • Increasing the no of partitons
  • Aggregations Over Structured Data:
  • reduceByKey()

IV) GROUPINGS AND AGGREGATIONS

  • i) Single Grouping and Single Aggregation
  • ii) Single Grouping and multiple Aggregation
  • iii) multi Grouping and Single Aggregation
  • iv) Multi Grouping and Multi Aggregation
  • Differences b/w reduceByKey() and groupByKey()
  • Process of groupByKey
  • Process of reduceByKey
  • Reduce() function
  • Various Transformations
  • Various Built-in Functions

V) Various Actions and Transformations:

  • countByKey()
  • countByValue()
  • sortByKey()
  • zip()
  • Union()
  • Distinct()
  • Various count aggregation
  • Joins
  • -inner join
  • -outer join
  • Cartesian()
  • Cogroup()
  • Other actions and transformations

VI) PySpark SQL - DataFrame

  • Introduction
  • Making data Structured
  • Case Classes
  • ways to extract case class objects
  • 1) using function
  • 2) using map with multiple exressions
  • 3) using map with single expression
  • Sql Context
  • Data Frames API
  • DataSet API
  • RDD vs DataFrame vs DataSet
  • PySpark – Create a DataFrame
  • PySpark – Create an empty DataFrame
  • PySpark – Convert RDD to DataFrame
  • PySpark – Convert DataFrame to Pandas
  • PySpark – show()
  • PySpark – StructType & StructField
  • PySpark – Row Class
  • PySpark – Column Class
  • PySpark – select()
  • PySpark – collect()
  • PySpark – withColumn()
  • PySpark – withColumnRenamed()
  • PySpark – where() & filter()
  • PySpark – drop() & dropDuplicates()
  • PySpark – orderBy() and sort()
  • PySpark – groupBy()
  • PySpark – join()
  • PySpark – union() & unionAll()
  • PySpark – unionByName()
  • PySpark – UDF (User Defined Function)
  • PySpark – map()
  • PySpark – flatMap()
  • pyspark – foreach()
  • PySpark – sample() vs sampleBy()
  • PySpark – fillna() & fill()
  • PySpark – pivot() (Row to Column)
  • PySpark – partitionBy()
  • PySpark – ArrayType Column (Array)
  • PySpark – MapType (Map/Dict)

VII) PySpark SQL Functions

  • PySpark – Aggregate Functions
  • PySpark – Window Functions
  • PySpark – Date and Timestamp Functions
  • PySpark – JSON Functions
  • PySpark – Read & Write JSON file

VIII) PySpark Built-In Functions

  • PySpark – when()
  • PySpark – expr()
  • PySpark – lit()
  • PySpark – split()
  • PySpark – concat_ws()
  • Pyspark – substring()
  • PySpark – translate()
  • PySpark – regexp_replace()
  • PySpark – overlay()
  • PySpark – to_timestamp()
  • PySpark – to_date()
  • PySpark – date_format()
  • PySpark – datediff()
  • PySpark – months_between()
  • PySpark – explode()
  • PySpark – array_contains()
  • PySpark – array()
  • PySpark – collect_list()
  • PySpark – collect_set()
  • PySpark – create_map()
  • PySpark – map_keys()
  • PySpark – map_values()
  • PySpark – struct()
  • PySpark – countDistinct()
  • PySpark – sum(), avg()
  • PySpark – row_number()
  • PySpark – rank()
  • PySpark – dense_rank()
  • PySpark – percent_rank()
  • PySpark – typedLit()
  • PySpark – from_json()
  • PySpark – to_json()
  • PySpark – json_tuple()
  • PySpark – get_json_object()
  • PySpark – schema_of_json()
  • Working Examples

IX) Pyspark External Sources

  • Working with sql statements
  • Spark and Hive Integration
  • Spark and mysql Integration
  • Working with CSV
  • Working with JSON
  • Transformations and actions on dataframes
  • Narrow, wide transformations
  • Addition of new columns, dropping of columns ,renaming columns
  • Addition of new rows, dropping rows
  • Handling nulls
  • Joins
  • Window function
  • Writing data back to External sources
  • Creation of tables fromDataframes (Internal tables, Temporary tables)

X) DEPLOYMENT MODES

  • Local Mode
  • Cluster Modes(Standalone , YARN

XI) PYSPARK APLLICATION

  • Stages and Tasks
  • Driver and Executor
  • Building spark applications/pipelines
  • Deploying spark apps to cluster and tuning
  • Performance tuning

PySpark Streaming Concepts

Integration with Kafka

PySpark-mllib

PYTHON

1. Python Basics

  • What is Python
  • Why Python?
  • History of python
  • Applications of Python
  • Features of Python
  • Advantages of Python
  • Versions of Python
  • Installation of Python
  • Flavors of Python
  • Comparision b/w various programming languages C, Java and Python

2. Python Operations

  • Python Modes of Execution
  • Interactive mode of Execution
  • Batch mode of Execution
  • Python Editors and IDEs
  • Python Data Types
  • Python Constants
  • Python Variables
  • Comments in python
  • Output Print(),function
  • Input() Function :Accepting input
  • Type Conversion
  • Type(),Id() Functions
  • Comments in Python
  • Escape Sequences in Python
  • Strings in Python
  • String indices and slicing

3. Operators in Python

  • Arithmetic Operators
  • Comparision Operators
  • Logical Operators
  • Assignment Operators
  • Short Hand Assignment Operators
  • Bitwise Operators
  • Membership Operators
  • Identity Operators

4. Python IDE’s

  • Pycharm IDE Installation
  • Working with Pycharm
  • Pycharm components
  • Installing Anaconda
  • What is Conda?
  • Anaconda Prompt
  • Anaconda Navigator
  • Jupyter Notebook
  • Jupyter Features
  • Spyder IDE
  • Spyder Featueres
  • Conda and PIP

5. Flow Control statements

  • Block/clause
  • Indentation in Python
  • Conditional Statements
    • if stmt
    • if…else statement
    • if…elif…statement

6. Looping Statements

  • while loop,
  • while … else,
  • for loop
  • Range() in for loop
  • Nested for loop
  • Break statememt
  • Continue statement
  • Pass statement

7. Strings in Python

  • Creating Strings
  • String indexing
  • String slicing
  • String Concatenation
  • String Comparision
  • String splitting and joining
  • Finding Sub Strings
  • String Case Change
  • Split strings
  • String methods

8. Collections in Python

  • Introduction
  • Lists
  • Tuples
  • Sets
  • Dictionaries
  • Operations on collections
  • Functions for collections
  • Methods of collection
  • Nested collections
  • Differences b/w list tuple and set and Dictionary

9. Python Lists

  • List properties
  • List Creation
  • List indexing and slicing
  • List Operations
  • List addresses
  • List functions
  • Different ways of creating lists
  • Nested Lists
  • List modification
  • List insertion and deletion
  • List Methods

10. Python Tuples

  • Tuple properties
  • Tuple Creation
  • Tuple indexing and slicing
  • Different ways of creating tuples
  • Tuple Operations
  • Tuple Addresses
  • Tuple Functions
  • Nested Tuples
  • Tuple Methods
  • Differences b/w List and Tuple

11. Python Sets

  • Set properties
  • Set Creation
  • Set Operations
  • Set Functions
  • Set Addresses
  • Set Mathematical Operations
  • Set Methods
  • Insertion and Deletion operation

12. Python Dictionary

  • Dictionary properties
  • Dictionary Creation
  • Dictionary Operations
  • Dictionary Addresses
  • Nested Dictionaries
  • Dictionary Methods
  • Insertion and Deletion of elements
  • Differences b/w list tuple and set and Dictionary

13. Functions in Python

  • Defining a function
  • Calling a function
  • Properties of Function
  • Examples of Functions
  • Categories of Functions
  • Argument types
    • default arguments
    • non-default arguments
    • keyword arguments
    • non keyword arguments
  • Variable Length Arguments
  • Variables scope
  • Call by value and Call by Reference
  • Passing collections to function
  • Local and Global variables
  • Recursive Function
  • Boolean Function
  • Passing functions to function
  • Anonymous or Lamda function
  • Filter() and map() functions
  • Reduce Function

14. Modules in Python

  • What is a module?
  • Different types of module
  • Creating user defined module
  • Setting path
  • The import statement
  • Normal Import
  • From … Import
  • Module Aliases
  • Reloading a module
  • Dir function
  • Working with Standard modules -Math, Random, Date time and os modules,

15. Packages

  • Introduction to packages
  • Defining packages
  • Importing from packages
  • --init--.py file
  • Defining sub packages
  • Importing from sub packages

16. Errors and Exception Handling

  • Types of errors
  • Compile-Time Errors
  • Run-Time Errors
  • What is Exception?
  • Need of Exception handling
  • Predefined Exceptions
  • Try,Except, finally blocks
  • Nested blocks
  • Handling Multiple Exceptions
  • User defined Exceptions
  • Raise statement

17. File Handling

  • Introduction
  • Types of Files in Python
  • Opening a file
  • Closing a file
  • Writing data to files
  • Tell( ) and seek( ) methods
  • Reading a data from files
  • Appending data to files
  • With open stmt
  • Various functions

18. OOPs Concepts

  • OOPS Features
  • Encapsulation
  • Abstraction
  • Class
  • Object
  • Static and non static variables
  • Defining methods
  • Diff b/w functions & methods
  • Constructors
  • Parameterized Constructors
  • Built –in attributes
  • Object Reference count
  • Destructor
  • Garbage Collection
  • Inheritance
  • Types of Inheritances
  • Object class
  • Polymorphism
  • Over riding
  • Super() statement

19. Regular Expressions

  • What is regular expression?
  • Special characters
  • Forming regular expression
  • Compiling regular expressions
  • Grouping
  • Findall() function
  • Finditer() function
  • Sub() function
  • Match() function
  • Search() function
  • Matching vs searching
  • Splitting a string
  • Replacing text
  • validations

20. Database Access

  • Introduction
  • Installing mysql database
  • Creating database users,
  • Installing Oracle Python modules
  • Establishing connection with mysql
  • Closing database connections
  • Connection object
  • Cursor object
  • Executing SQL queries
  • Retrieving data from Database.
  • Using bind variables executing
  • SQL queries
  • Transaction Management
  • Handling errors

21. Python Date and Time

  • How to Use Date &DateTime Class
  • Time and date Objects
  • Calendar in Python
  • The Time Module
  • Python Calendar Module

22. Operating System Module

  • Introduction
  • getcwd
  • listdir
  • chdir
  • mkdir
  • rename file/dir
  • remove file/dir
  • rmtree()
  • Os help
  • Os operations

23. Advanced concepts

  • Python Iterator
  • Python Generator
  • Python closure
  • Python Decorators
  • Web Scraping
  • PIP
  • Working with CSV files
  • Working with XML files
  • Working with JSON files
  • Debugging

24. GUI Programming (tkinter)

  • Introduction
  • Components and events
  • Root window
  • Labels
  • Fonts and colors
  • Buttons, checkbox
  • Label widget
  • Message widget
  • Text widget
  • Radio button
  • image

25. Excel Workbook

  • Installing and working with Xlsx writer
  • Creating Excel Work book
  • Inserting into excel sheet
  • Insetting data into multiple excel sheets
  • Creating headers
  • Installing and working with xlrd module
  • Reading a specific cell or row or column
  • Reading specific rows and columns

26. Data Analytics

  • Introduction
  • pandas module
  • Numpy module
  • Matplotlib module
  • Working Examples

27. Introduction to Datascience

  • Machine Learning Introduction
  • Datasets
  • Supervised /Unsupervised Learning
  • Statistical Analysis
  • Data Analysis
  • Uni-variate/multi-variate analysis
  • Corelation Analysis
  • Algorithm types
  • Applications

28. Python Pandas

  • Introduction to Pandas
  • Creating Pandas Series
  • Creating Data Frames
  • Pandas Data Frames from dictionaries
  • Pandas Data Frames from list
  • Pandas Data Frames from series
  • Pandas Data Frames from CSV, Excel
  • Pandas Data Frames from JSON
  • Pandas Data Frames from Databases
  • Pandas Data Functionality
  • Pandas Timedelta
  • Creating Data Frames from Timedelta
  • Pandas Groupings and Aggregations
  • Converting Data Frames from list
  • Creating Functions
  • Converting Different Formats
  • Pandas and Matplotlib
  • Pandas usecases

29. Python Numpy

  • Introduction to Numpy
  • Numpy Arrays
  • Numpy Array Indexing
  • 2-D and 3Dimensional Arrays
  • Numpy Mathematical operations
  • Numpy Flattening and reshaping
  • Numpy Horizontal and Vertical Stack
  • Numpy linespace and arrange
  • Numpy asarray and Random numbers
  • Numpy iterations and Transpose
  • Numpy Array Manipulation
  • Numpy and matplotlib
  • Numpy Linear Algebra
  • Numpy String Functions
  • Numpy operations and usecases
  • Numpy Working Examples

30. Python Matplotlib

  • Introduction to matplotlib
  • Installing matplotlib
  • Generating graphs
  • Normal plottings
  • Generating Bargraphs
  • Histograms
  • Scatter plots
  • Stack plots
  • Pie plots
  • Matplotlib working examples