Subcribe and Access : 5200+ FREE Videos and 21+ Subjects Like CRT, SoftSkills, JAVA, Hadoop, Microsoft .NET, Testing Tools etc..
Batch
Date: Nov 30th & Dec 1st @7:00PM
Faculty: Mr. N. Vijay Sunder Sagar (20+ Yrs Of Exp,..)
Duration: 10 Weekends Batch
Venue
:
DURGA SOFTWARE SOLUTIONS,
Flat No : 202,
2nd Floor,
HUDA Maitrivanam,
Ameerpet, Hyderabad - 500038
Ph. No: +91 - 9246212143, 80 96 96 96 96
Syllabus:
PYSPARK
I ) PYSPARK INTRODUCTION
- What is Apache Spark?
- Why Pyspark?
- Need for pyspark
- spark Python Vs Scala
- pyspark features
- Real-life usage of PySpark
- PySpark Web/Application
- PySpark - SparkSession
- PySpark – SparkContext
- PySpark – RDD
- PySpark – Parallelize
- PySpark – repartition() vs coalesce()
- PySpark – Broadcast Variables
- PySpark – Accumulator
II) PYSPARK - RDD COMPUTATION
- Operations on a RDD
- Direct Acyclic Graph (DAG)
- RDD Actions and Transformations
- RDD computation
- Steps in RDD computation
- RDD persistence
- Persistence features
II) PERSISTENCE Options:
- 1) MEMORY_ONLY
- 2) MEMORY_SER_ONLY
- 3) DISK_ONLY
- 4) DISK_SER_ONLY
- 5) MEMORY_AND_DISK_ONLY
III) PYSPARK - CORE COMPUTING
- Fault Tolerence model in spark
- Different ways of creating a RDD
- Word Count Example
- Creating spark objects(RDDs) from Scala Objects(lists).
- Increasing the no of partitons
- Aggregations Over Structured Data:
- reduceByKey()
IV) GROUPINGS AND AGGREGATIONS
- i) Single Grouping and Single Aggregation
- ii) Single Grouping and multiple Aggregation
- iii) multi Grouping and Single Aggregation
- iv) Multi Grouping and Multi Aggregation
- Differences b/w reduceByKey() and groupByKey()
- Process of groupByKey
- Process of reduceByKey
- Reduce() function
- Various Transformations
- Various Built-in Functions
V) Various Actions and Transformations:
- countByKey()
- countByValue()
- sortByKey()
- zip()
- Union()
- Distinct()
- Various count aggregation
- Joins
- -inner join
- -outer join
- Cartesian()
- Cogroup()
- Other actions and transformations
VI) PySpark SQL - DataFrame
- Introduction
- Making data Structured
- Case Classes
- ways to extract case class objects
- 1) using function
- 2) using map with multiple exressions
- 3) using map with single expression
- Sql Context
- Data Frames API
- DataSet API
- RDD vs DataFrame vs DataSet
- PySpark – Create a DataFrame
- PySpark – Create an empty DataFrame
- PySpark – Convert RDD to DataFrame
- PySpark – Convert DataFrame to Pandas
- PySpark – show()
- PySpark – StructType & StructField
- PySpark – Row Class
- PySpark – Column Class
- PySpark – select()
- PySpark – collect()
- PySpark – withColumn()
- PySpark – withColumnRenamed()
- PySpark – where() & filter()
- PySpark – drop() & dropDuplicates()
- PySpark – orderBy() and sort()
- PySpark – groupBy()
- PySpark – join()
- PySpark – union() & unionAll()
- PySpark – unionByName()
- PySpark – UDF (User Defined Function)
- PySpark – map()
- PySpark – flatMap()
- pyspark – foreach()
- PySpark – sample() vs sampleBy()
- PySpark – fillna() & fill()
- PySpark – pivot() (Row to Column)
- PySpark – partitionBy()
- PySpark – ArrayType Column (Array)
- PySpark – MapType (Map/Dict)
VII) PySpark SQL Functions
- PySpark – Aggregate Functions
- PySpark – Window Functions
- PySpark – Date and Timestamp Functions
- PySpark – JSON Functions
- PySpark – Read & Write JSON file
VIII) PySpark Built-In Functions
- PySpark – when()
- PySpark – expr()
- PySpark – lit()
- PySpark – split()
- PySpark – concat_ws()
- Pyspark – substring()
- PySpark – translate()
- PySpark – regexp_replace()
- PySpark – overlay()
- PySpark – to_timestamp()
- PySpark – to_date()
- PySpark – date_format()
- PySpark – datediff()
- PySpark – months_between()
- PySpark – explode()
- PySpark – array_contains()
- PySpark – array()
- PySpark – collect_list()
- PySpark – collect_set()
- PySpark – create_map()
- PySpark – map_keys()
- PySpark – map_values()
- PySpark – struct()
- PySpark – countDistinct()
- PySpark – sum(), avg()
- PySpark – row_number()
- PySpark – rank()
- PySpark – dense_rank()
- PySpark – percent_rank()
- PySpark – typedLit()
- PySpark – from_json()
- PySpark – to_json()
- PySpark – json_tuple()
- PySpark – get_json_object()
- PySpark – schema_of_json()
- Working Examples
IX) Pyspark External Sources
- Working with sql statements
- Spark and Hive Integration
- Spark and mysql Integration
- Working with CSV
- Working with JSON
- Transformations and actions on dataframes
- Narrow, wide transformations
- Addition of new columns, dropping of columns ,renaming columns
- Addition of new rows, dropping rows
- Handling nulls
- Joins
- Window function
- Writing data back to External sources
- Creation of tables fromDataframes (Internal tables, Temporary tables)
X) DEPLOYMENT MODES
- Local Mode
- Cluster Modes(Standalone , YARN
XI) PYSPARK APLLICATION