|
|
|
HADOOP + PYSPARK + PYTHON + LINUX Course Details |
|
Subcribe and Access : 5200+ FREE Videos and 21+ Subjects Like CRT, SoftSkills, JAVA, Hadoop, Microsoft .NET, Testing Tools etc..
Batch
Date: Nov 30th & Dec 1st @6:00PM
Faculty: Mr. N. Vijay Sunder Sagar (20+ Yrs Of Exp,..)
Duration: 15 Weekends Batch
Venue
:
DURGA SOFTWARE SOLUTIONS,
Flat No : 202,
2nd Floor,
HUDA Maitrivanam,
Ameerpet, Hyderabad - 500038
Ph. No: +91 - 9246212143, 80 96 96 96 96
Syllabus:
BIG DATA HADOOP
I: INTRODUCTION
- What is Big Data?
- What is Hadoop?
- Need of Hadoop
- Sources and Types of Data
- Comparison with Other Technologies
- Challenges with Big Data
- i. Storage
- ii. Processing
- RDBMS vs Hadoop
- Advantages of Hadoop
- Hadoop Echo System components
II: HDFS (Hadoop Distributed File System)
- Features of HDFS
- Name node ,Data node ,Blocks
- Configuring Block size,
- HDFS Architecture ( 5 Daemons)
- i. Name Node
- ii. Data Node
- iii. Secondary Name node
- iv. Job Tracker
- v. Task Tracker
- Metadata management
- Storage and processing
- Replication in Hadoop
- Configuring Custom Replication
- Fault Tolerance in Hadoop
- HDFS Commands
III: MAP REDUCE
- Map Reduce Architecture
- Processing Daemons of Hadoop
- Job Tracker (Roles and Responsibilities)
- Task Tracker(Roles and Responsibilities)
- Phases of Map Reduce
- i) Mapper phase
- ii) Reducer phase
- Input split
- Input split vs Block size
- Partitioner in Map Reduce
- Groupings and Aggregations
- Data Types in Map Reduce
- Map Reduce Programming Model
- Driver Code
- Mapper Code
- Reducer Code
- Programming examples
- File input formats
- File output formats
- Merging in Map Reduce
- Speculative Execution Model
- Speculative Job
IV: SQOOP (SQL + HADOOP)
- Introduction to Sqoop
- SQOOP Import
- SQOOP Export
- Importing Data From RDBMS to HDFS
- Importing Data From RDBMS to HIVE
- Importing Data From RDBMS to HBASE
- Exporting From HASE to RDBMS
- Exporting From HBASE to RDBMS
- Exporting From HIVE to RDBMS
- Exporting From HDFS to RDBMS
- Transformations While Importing / Exporting
- Filtering data while importing
- Vertical and Horizontal merging while import
- Working with delimiters while importing
- Groupings and Aggregations while import
- Incremental import
- Examples and operations
- Defining SQOOP Jobs
V: YARN
- Introduction
- Speculative Execution ,Speculative job and
- Speculative Task.
- Comparision of Hadoop1.xx with Hadoop2.xx
- Comparision with previous versions
- YARN Architecture Componets
- i. Resource Manager
- ii. Application Master
- iii. Node Manager
- iv. Application Manager
- v. Resource Scheduler
- vi. Job History Server
- vii. Container
VI: NOSQL
- What is “Not only SQL”
- NOSQL Advantages
- What is problem with RDBMS for Large
- Data Scaling Systems
- Types of NOSQL & Purposes
- Key Value Store
- Columer Store
- Document Store
- Graph Store
- Introduction to cassandra – NOSQL Database
- Introduction to MongoDB and CouchDB Database
- Intergration of NOSQL Databases with Hadoop
VII: HBASE
- Introduction to big table
- What is NOSQL and colummer store Database
- HBASE Introduction
- Hbase use cases
- Hbase basics
- Column families
- Scans
- Hbase Architecture
- Map Reduce Over Hbase
- Hbase data Modeling
- Hbase Schema design
- Hbase CRUD operators
- Hive & Hbaseinteragation
- Hbase storage handlers
VIII: HIVE
- Introduction
- Hive Architecture
- Hive Metastore
- Hive Query Launguage
- Difference between HQL and SQL
- Hive Built in Functions
- Loading Data From Local Files To Hive Tables
- Loading Data From Hdfs Files To Hive Tables
- Tables Types
- Inner Tables
- External Tables
- Hive Working with unstructured data
- Hive Working With Xml Data
- Hive Working With Json Data
- Hive Working With Urls And Weblog Data
- Hive Unions
- Hive Joins
- Multi Table / File Inserts
- Inserting Into Local Files
- Inserting Into Hdfs Files
- Hive UDF (user defined functions)
- Hive UDAF (user defined Aggregated functions)
- Hive UDTF (user defined table Generated functions
- Partitioned Tables
- Non – Partitioned Tables
- Multi-column Partitioning
- Dynamic Partitions In Hive
- Performance Tuning mechanism
- Bucketing in hive
- Indexing in Hive
- Hive Examples
- Hive & Hbase Integration
PYSPARK
I ) PYSPARK INTRODUCTION
- What is Apache Spark?
- Why Pyspark?
- Need for pyspark
- spark Python Vs Scala
- pyspark features
- Real-life usage of PySpark
- PySpark Web/Application
- PySpark - SparkSession
- PySpark – SparkContext
- PySpark – RDD
- PySpark – Parallelize
- PySpark – repartition() vs coalesce()
- PySpark – Broadcast Variables
- PySpark – Accumulator
II) PYSPARK - RDD COMPUTATION
- Operations on a RDD
- Direct Acyclic Graph (DAG)
- RDD Actions and Transformations
- RDD computation
- Steps in RDD computation
- RDD persistence
- Persistence features
II) PERSISTENCE Options:
- 1) MEMORY_ONLY
- 2) MEMORY_SER_ONLY
- 3) DISK_ONLY
- 4) DISK_SER_ONLY
- 5) MEMORY_AND_DISK_ONLY
III) PYSPARK - CORE COMPUTING
- Fault Tolerence model in spark
- Different ways of creating a RDD
- Word Count Example
- Creating spark objects(RDDs) from Scala Objects(lists).
- Increasing the no of partitons
- Aggregations Over Structured Data:
- reduceByKey()
IV) GROUPINGS AND AGGREGATIONS
- i) Single Grouping and Single Aggregation
- ii) Single Grouping and multiple Aggregation
- iii) multi Grouping and Single Aggregation
- iv) Multi Grouping and Multi Aggregation
- Differences b/w reduceByKey() and groupByKey()
- Process of groupByKey
- Process of reduceByKey
- Reduce() function
- Various Transformations
- Various Built-in Functions
V) Various Actions and Transformations:
- countByKey()
- countByValue()
- sortByKey()
- zip()
- Union()
- Distinct()
- Various count aggregation
- Joins
- -inner join
- -outer join
- Cartesian()
- Cogroup()
- Other actions and transformations
VI) PySpark SQL - DataFrame
- Introduction
- Making data Structured
- Case Classes
- ways to extract case class objects
- 1) using function
- 2) using map with multiple exressions
- 3) using map with single expression
- Sql Context
- Data Frames API
- DataSet API
- RDD vs DataFrame vs DataSet
- PySpark – Create a DataFrame
- PySpark – Create an empty DataFrame
- PySpark – Convert RDD to DataFrame
- PySpark – Convert DataFrame to Pandas
- PySpark – show()
- PySpark – StructType & StructField
- PySpark – Row Class
- PySpark – Column Class
- PySpark – select()
- PySpark – collect()
- PySpark – withColumn()
- PySpark – withColumnRenamed()
- PySpark – where() & filter()
- PySpark – drop() & dropDuplicates()
- PySpark – orderBy() and sort()
- PySpark – groupBy()
- PySpark – join()
- PySpark – union() & unionAll()
- PySpark – unionByName()
- PySpark – UDF (User Defined Function)
- PySpark – map()
- PySpark – flatMap()
- pyspark – foreach()
- PySpark – sample() vs sampleBy()
- PySpark – fillna() & fill()
- PySpark – pivot() (Row to Column)
- PySpark – partitionBy()
- PySpark – ArrayType Column (Array)
- PySpark – MapType (Map/Dict)
VII) PySpark SQL Functions
- PySpark – Aggregate Functions
- PySpark – Window Functions
- PySpark – Date and Timestamp Functions
- PySpark – JSON Functions
- PySpark – Read & Write JSON file
VIII) PySpark Built-In Functions
- PySpark – when()
- PySpark – expr()
- PySpark – lit()
- PySpark – split()
- PySpark – concat_ws()
- Pyspark – substring()
- PySpark – translate()
- PySpark – regexp_replace()
- PySpark – overlay()
- PySpark – to_timestamp()
- PySpark – to_date()
- PySpark – date_format()
- PySpark – datediff()
- PySpark – months_between()
- PySpark – explode()
- PySpark – array_contains()
- PySpark – array()
- PySpark – collect_list()
- PySpark – collect_set()
- PySpark – create_map()
- PySpark – map_keys()
- PySpark – map_values()
- PySpark – struct()
- PySpark – countDistinct()
- PySpark – sum(), avg()
- PySpark – row_number()
- PySpark – rank()
- PySpark – dense_rank()
- PySpark – percent_rank()
- PySpark – typedLit()
- PySpark – from_json()
- PySpark – to_json()
- PySpark – json_tuple()
- PySpark – get_json_object()
- PySpark – schema_of_json()
- Working Examples
IX) Pyspark External Sources
- Working with sql statements
- Spark and Hive Integration
- Spark and mysql Integration
- Working with CSV
- Working with JSON
- Transformations and actions on dataframes
- Narrow, wide transformations
- Addition of new columns, dropping of columns ,renaming columns
- Addition of new rows, dropping rows
- Handling nulls
- Joins
- Window function
- Writing data back to External sources
- Creation of tables fromDataframes (Internal tables, Temporary tables)
X) DEPLOYMENT MODES
- Local Mode
- Cluster Modes(Standalone , YARN
XI) PYSPARK APLLICATION
- Stages and Tasks
- Driver and Executor
- Building spark applications/pipelines
- Deploying spark apps to cluster and tuning
- Performance tuning
PySpark Streaming Concepts
Integration with Kafka
PySpark-mllib
PYTHON
1. Python Basics
- What is Python
- Why Python?
- History of python
- Applications of Python
- Features of Python
- Advantages of Python
- Versions of Python
- Installation of Python
- Flavors of Python
- Comparision b/w various programming languages C, Java and Python
2. Python Operations
- Python Modes of Execution
- Interactive mode of Execution
- Batch mode of Execution
- Python Editors and IDEs
- Python Data Types
- Python Constants
- Python Variables
- Comments in python
- Output Print(),function
- Input() Function :Accepting input
- Type Conversion
- Type(),Id() Functions
- Comments in Python
- Escape Sequences in Python
- Strings in Python
- String indices and slicing
3. Operators in Python
- Arithmetic Operators
- Comparision Operators
- Logical Operators
- Assignment Operators
- Short Hand Assignment Operators
- Bitwise Operators
- Membership Operators
- Identity Operators
4. Python IDE’s
- Pycharm IDE Installation
- Working with Pycharm
- Pycharm components
- Installing Anaconda
- What is Conda?
- Anaconda Prompt
- Anaconda Navigator
- Jupyter Notebook
- Jupyter Features
- Spyder IDE
- Spyder Featueres
- Conda and PIP
5. Flow Control statements
- Block/clause
- Indentation in Python
- Conditional Statements
- if stmt
- if…else statement
- if…elif…statement
6. Looping Statements
- while loop,
- while … else,
- for loop
- Range() in for loop
- Nested for loop
- Break statememt
- Continue statement
- Pass statement
7. Strings in Python
- Creating Strings
- String indexing
- String slicing
- String Concatenation
- String Comparision
- String splitting and joining
- Finding Sub Strings
- String Case Change
- Split strings
- String methods
8. Collections in Python
- Introduction
- Lists
- Tuples
- Sets
- Dictionaries
- Operations on collections
- Functions for collections
- Methods of collection
- Nested collections
- Differences b/w list tuple and set and Dictionary
9. Python Lists
- List properties
- List Creation
- List indexing and slicing
- List Operations
- List addresses
- List functions
- Different ways of creating lists
- Nested Lists
- List modification
- List insertion and deletion
- List Methods
10. Python Tuples
- Tuple properties
- Tuple Creation
- Tuple indexing and slicing
- Different ways of creating tuples
- Tuple Operations
- Tuple Addresses
- Tuple Functions
- Nested Tuples
- Tuple Methods
- Differences b/w List and Tuple
11. Python Sets
- Set properties
- Set Creation
- Set Operations
- Set Functions
- Set Addresses
- Set Mathematical Operations
- Set Methods
- Insertion and Deletion operation
12. Python Dictionary
- Dictionary properties
- Dictionary Creation
- Dictionary Operations
- Dictionary Addresses
- Nested Dictionaries
- Dictionary Methods
- Insertion and Deletion of elements
- Differences b/w list tuple and set and Dictionary
13. Functions in Python
- Defining a function
- Calling a function
- Properties of Function
- Examples of Functions
- Categories of Functions
- Argument types
- default arguments
- non-default arguments
- keyword arguments
- non keyword arguments
- Variable Length Arguments
- Variables scope
- Call by value and Call by Reference
- Passing collections to function
- Local and Global variables
- Recursive Function
- Boolean Function
- Passing functions to function
- Anonymous or Lamda function
- Filter() and map() functions
- Reduce Function
14. Modules in Python
- What is a module?
- Different types of module
- Creating user defined module
- Setting path
- The import statement
- Normal Import
- From … Import
- Module Aliases
- Reloading a module
- Dir function
- Working with Standard modules -Math, Random, Date time and os modules,
15. Packages
- Introduction to packages
- Defining packages
- Importing from packages
- --init--.py file
- Defining sub packages
- Importing from sub packages
16. Errors and Exception Handling
- Types of errors
- Compile-Time Errors
- Run-Time Errors
- What is Exception?
- Need of Exception handling
- Predefined Exceptions
- Try,Except, finally blocks
- Nested blocks
- Handling Multiple Exceptions
- User defined Exceptions
- Raise statement
17. File Handling
- Introduction
- Types of Files in Python
- Opening a file
- Closing a file
- Writing data to files
- Tell( ) and seek( ) methods
- Reading a data from files
- Appending data to files
- With open stmt
- Various functions
18. OOPs Concepts
- OOPS Features
- Encapsulation
- Abstraction
- Class
- Object
- Static and non static variables
- Defining methods
- Diff b/w functions & methods
- Constructors
- Parameterized Constructors
- Built –in attributes
- Object Reference count
- Destructor
- Garbage Collection
- Inheritance
- Types of Inheritances
- Object class
- Polymorphism
- Over riding
- Super() statement
19. Regular Expressions
- What is regular expression?
- Special characters
- Forming regular expression
- Compiling regular expressions
- Grouping
- Findall() function
- Finditer() function
- Sub() function
- Match() function
- Search() function
- Matching vs searching
- Splitting a string
- Replacing text
- validations
20. Database Access
- Introduction
- Installing mysql database
- Creating database users,
- Installing Oracle Python modules
- Establishing connection with mysql
- Closing database connections
- Connection object
- Cursor object
- Executing SQL queries
- Retrieving data from Database.
- Using bind variables executing
- SQL queries
- Transaction Management
- Handling errors
21. Python Date and Time
- How to Use Date &DateTime Class
- Time and date Objects
- Calendar in Python
- The Time Module
- Python Calendar Module
22. Operating System Module
- Introduction
- getcwd
- listdir
- chdir
- mkdir
- rename file/dir
- remove file/dir
- rmtree()
- Os help
- Os operations
23. Advanced concepts
- Python Iterator
- Python Generator
- Python closure
- Python Decorators
- Web Scraping
- PIP
- Working with CSV files
- Working with XML files
- Working with JSON files
- Debugging
24. GUI Programming (tkinter)
- Introduction
- Components and events
- Root window
- Labels
- Fonts and colors
- Buttons, checkbox
- Label widget
- Message widget
- Text widget
- Radio button
- image
25. Excel Workbook
- Installing and working with Xlsx writer
- Creating Excel Work book
- Inserting into excel sheet
- Insetting data into multiple excel sheets
- Creating headers
- Installing and working with xlrd module
- Reading a specific cell or row or column
- Reading specific rows and columns
26. Data Analytics
- Introduction
- pandas module
- Numpy module
- Matplotlib module
- Working Examples
27. Introduction to Datascience
- Machine Learning Introduction
- Datasets
- Supervised /Unsupervised Learning
- Statistical Analysis
- Data Analysis
- Uni-variate/multi-variate analysis
- Corelation Analysis
- Algorithm types
- Applications
28. Python Pandas
- Introduction to Pandas
- Creating Pandas Series
- Creating Data Frames
- Pandas Data Frames from dictionaries
- Pandas Data Frames from list
- Pandas Data Frames from series
- Pandas Data Frames from CSV, Excel
- Pandas Data Frames from JSON
- Pandas Data Frames from Databases
- Pandas Data Functionality
- Pandas Timedelta
- Creating Data Frames from Timedelta
- Pandas Groupings and Aggregations
- Converting Data Frames from list
- Creating Functions
- Converting Different Formats
- Pandas and Matplotlib
- Pandas usecases
29. Python Numpy
- Introduction to Numpy
- Numpy Arrays
- Numpy Array Indexing
- 2-D and 3Dimensional Arrays
- Numpy Mathematical operations
- Numpy Flattening and reshaping
- Numpy Horizontal and Vertical Stack
- Numpy linespace and arrange
- Numpy asarray and Random numbers
- Numpy iterations and Transpose
- Numpy Array Manipulation
- Numpy and matplotlib
- Numpy Linear Algebra
- Numpy String Functions
- Numpy operations and usecases
- Numpy Working Examples
30. Python Matplotlib
- Introduction to matplotlib
- Installing matplotlib
- Generating graphs
- Normal plottings
- Generating Bargraphs
- Histograms
- Scatter plots
- Stack plots
- Pie plots
- Matplotlib working examples
|
|
|
|
|
|