Subcribe and Access : 5200+ FREE Videos and 21+ Subjects Like CRT, SoftSkills, JAVA, Hadoop, Microsoft .NET, Testing Tools etc..                
                Batch 
                  Date: Nov 
                  8th & 9th @6:00PM
                  
                  Faculty: Mr. N. Vijay Sunder Sagar (20+ Yrs of Exp,..)
                Duration: 10 Weekends Batch
                Venue 
                  : 
                  DURGA SOFTWARE SOLUTIONS,
Flat No : 202,
                  2nd Floor,
HUDA Maitrivanam,
Ameerpet, Hyderabad - 500038
                
                Ph.No: +91 - 8885252627, 9246212143, 80 96 96 96 96
                  
                   
                Syllabus:
            GCP Data Engineering 
                Module 1:  GCP Data Engineering Fundamentals
                
                Module 2:  Google Cloud Storage(GCS)
                
                Module 3:  Cloud SQL -Setting up Database
                
                Module 4:  BigQuery – for Building Datawarehouse
                
                Module 5:  Dataproc – Bigdata processing
                
                Module 6:  Databricks – Pyspark Processing
                
                Module 7:  Dataflow -Apache Beam Development
                
                Module 8:  Google Cloud Composer – Orchestration
                
                Module 9: Data Fusion –  Data Integration service
  
                
                Module  10 :  DBT, Airflow and Terraform 
                1. Introduction to Google Cloud Platform
                
                  - Overview of cloud platforms 
-  GCP services and regions
-  IAM (Identity & Access Management) basics
-  Resource hierarchy (organization, folders, projects)
-  Billing & cost management
2.    GCP Data Engineering Fundamentals
                
                  - Data engineering roles & responsibilities
-  Batch vs real-time data processing
-  Data lake vs data warehouse vs data marts
-  ETL vs ELT
3.    Getting Started with GCP
                
                  - Signing for GCP Account
-  Create New Google Account using Non Gmail Id
-  Sign up for GCP using Google Account
-  Overview of GCP Credits
-  Overview of GCP Project and Billing
-  Overview of Google Cloud Shell
-  Install Google Cloud SDK on Windows
-  Initialize gcloud CLI using GCP Project
-  Reinitialize Google Cloud Shell with Project id
-  Overview of Analytics Services on GCP
4.   Storage & Databases in GCP
                
                  - Cloud Storage – buckets, lifecycle, versioning
-  BigQuery – datasets, tables, partitioning, clustering, query optimization
-  Cloud SQL & Cloud Spanner – relational databases
-  Firestore & Bigtable – NoSQL databases
-  Data modeling best practices
5. Google Cloud Storage (GCS)  :   Setting up Datalake using GCS
                
                  - Getting Started with Google Cloud Storage or GCS
-  Overview of Google Cloud Storage or GCS Web UI
-  Create GCS Bucket using GCP Web UI
-  Upload Folders and Files using into GCS Bucket using GCP Web UI
-  Review GCS Buckets and Objects using gsutil commands
-  Delete GCS Bucket using Web UI
-  Setup Data Repository in Google Cloud Shell
-  Overview of Data Sets
-  Managing Buckets in GCS using gsutil
-  Copy Data Sets into GCS using gsutil
-  Cleanup Buckets in GCS using gsutil
-  Exercise to Manage Buckets and Files in GCS using gsutil
-  Overview of Setting up Data Lake using GCS
-  Setup Google Cloud Libraries in Python Virtual Environment
-  Setup Bucket and Files in GCS using gsutil
-  Getting Started to manage files in GCS using Python
- Setup Credentials for Python and GCS Integration
-  Review Methods in Google Cloud Storage Python library
-  Get GCS Bucket Details using Python
-  Manage Blobs or Files in GCS using Python
-  Project Problem Statement to Manage Files in GCS using Python
-  Design to Upload multiple files into GCS using Python
-  Get File Names to upload into GCS using Python glob and os
-  Upload all Files to GCS as blobs using Python
-  Validate Files or Blobs in GCS using Python
-  Overview of Processing Data in GCS using Pandas
-  Convert Data to Parquet and Write to GCS using Pandas
-  Design to Upload multiple files into GCS using Pandas
-  Get File Names to upload into GCS using Python glob and os
-  Overview of Parquet File Format and Schemas JSON File
-  Get Column Names for Dataset using Schemas JSON File
-  Upload all Files to GCS as Parquet using Pandas
-  Perform Validation of Files Copied using Pandas
6.   Cloud Sql :  Set Up postgres Database using cloud sql
                
                  - Overview of GCP Cloud SQL
-                     Setup Postgres Database Server using GCP Cloud SQL
-  Configure Network for Cloud SQL Postgres Database
-  Install Postgres 14 on Windows 11
-  Getting Started with pgAdmin on Windows
-  Getting Started with pgAdmin on Mac
-  Validate Client Tools for Postgres on Mac or PC
-  Setup Database in GCP Cloud SQL Postgres Database Server
-  Setup Tables in GCP Cloud SQL Postgres Database
-  Validate Data in GCP Cloud SQL Postgres Database Table
-  Integration of GCP Cloud SQL Postgres with Python
-  Overview of Integration of GCP Cloud SQL Postgres with Pandas
-  Read Data From Files to Pandas Data Frame
-  Process Data using Pandas Dataframe APIs
-  Write Pandas Dataframe into Postgres Database Table
-  Validate Data in Postgres Database Tables using Pandas
-  Getting Started with Secrets using GCP Secret Manager
-  Configure Access to GCP Secret Manager via IAM Roles
-  Install Google Cloud Secret Manager Python Library
-  Get Secret Details from GCP Secret Manager using Python
-  Connect to Database using Credentials from Secret Manager
-  Stop GCP Cloud SQL Postgres Database Server
7. Data Ingestion & Integration 
                
                  - Pub/Sub – message queues, subscriptions, streaming data ingestion
-  Dataflow (Apache Beam) – batch & stream data pipelines
-  Dataproc (Hadoop/Spark on GCP) – ETL with Spark/Hive
-  Transfer Service & Storage Transfer – on-prem to cloud data movement
-  APIs, connectors, and partner ETL tools (Informatica, Fivetran, etc.)
8. Data Processing & Transformation 
                
                  - Designing pipelines with Dataflow
 Streaming analytics with Pub/Sub + Dataflow
-  Data transformations using Dataproc (Spark/Presto/Hive)
-  BigQuery transformations (SQL-based ELT)
-  Using Dataprep (Trifacta) for no-code data wrangling
9. Data Warehousing & Analytics
                
                  - BigQuery :
                    
                      - Schemas, partitioning, clustering
-  Optimization (slots, caching, materialized views)
-  Federated queries (Cloud Storage, Bigtable, Sheets)
-  BigQuery ML basics
 
- Designing star/snowflake schemas
- Analytics & BI integration (Looker, Data Studio)
10. BigQuery:  for building Datawarehouse
                
                  - Overview of Google BigQuery
-  Getting Started with Google BigQuery
-  Overview of CRUD Operations in Google BigQuery
-  Merge or Upsert into Google BigQuery Tables
-  Create Dataset and Tables in Google BigQuery using UI
-  Create Table in Google BigQuery using Command
-  Exercise to create tables in Google BigQuery
-  Overview of Loading Data from Files into BigQuery Tables
-  Getting Started with Integration between Google BigQuery and Python
-  Load Data from GCS Files into an Empty Table in Google BigQuery
-  Run Queries in Google BigQuery using Python Applications
-  Exercise to Load Data into BigQuery Tables
-  Drop Tables from Google BigQuery
-  Overview of External Tables in BigQuery
-  Create Google BigQuery External Table on GCS Files using Web UI
-  Create Google BigQuery External Table on GCS Files using Command
-  Google BigQuery External Tables using AWS s3 or Azure Blob or Google Drive
-  Exercise to Create Google BigQuery External Tables
-  Overview of SQL Capabilities of Google BigQuery
-  Basic SQL Queries using Google BigQuery
-  Cumulative Aggregations using Google BigQuery
-  Compute Ranks using Google BigQuery
-  Filter based on Ranks using Google BigQuery
-  Overview of Key Integrations with Google BigQuery
-  Python Pandas Integration with Google BigQuery
-  Overview of Integration between BigQuery and RDBMS Databases
-  Validate Cloud SQL Postgres Database for BigQuery Integration
-  Create External Connections and Run External Queries from Google BigQuery
-  Running External Queries using External Connections in Google BigQuery
11. Dataproc : BigData Processing
                
                  - Getting Started with GCP Dataproc
-  Setup Single Node Dataproc Cluster for Development
-  Validate SSH Connectivity to Master Node of Dataproc Cluster
-  Allocate Static IP to the Master Node VM of Dataproc Cluster
-  Setup VS Code Remote Window for Dataproc VM
-  Setup Workspace using VS Code on Dataproc
-  Getting Started with HDFS Commands on Dataproc
-  Recap of gsutil to manage files and folders in GCS
-  Review Data Sets setup on Dataproc Master Node VM
-  Copy Local Files into HDFS on Dataproc
-  Copy GCS Files into HDFS on Dataproc.cmproj
-  Validate Pyspark CLI in Dataproc Cluster
-  Validate Spark Scala CLI in Dataproc Cluster
-  Validate Spark SQL CLI in Dataproc Cluster
12. ELT Datapipelines using Dataproc
                
                  - Overview of GCP Dataproc Jobs and Workflow
-  Setup JSON Dataset in GCS for Dataproc Jobs
-  Review Spark SQL Commands used for Dataproc Jobs
-  Run Dataproc Job using Spark SQL
-  Overview of Modularizing Spark SQL Applications for Dataproc
-  Review Spark SQL Scripts for Dataproc Jobs and Workflows
-  Validate Spark SQL Script for File Format Conversion
-  Exercise to convert file format using Spark SQL Script
-  Validate Spark SQL Script for Daily Product Revenue
-  Develop Spark SQL Script to Cleanup Database
-  Copy Spark SQL Scripts to GCS
-  Run and Validate Spark SQL Scripts in GCS
-  Limitations of Running Spark SQL Scripts using Dataproc Jobs
-  Manage Dataproc Clusters using gcloud Commands
-  Run Dataproc Jobs using Spark SQL Command or Query
-  Run Dataproc Jobs using Spark SQL Scripts
-  Exercises to Run Spark SQL Scripts as Dataproc Jobs using gcloud
-  Delete Dataproc Jobs using gcloud commands
-  Importance of using gcloud commands to manage dataproc jobs
-  Getting Started with Dataproc Workflow Templates using Web UI
-  Review Steps and Design to create Dataproc Workflow Template
-  Create Dataproc Workflow Template and Add Cluster using gcloud Commands
-  Review gcloud Commands to Add Jobs to Dataproc Workflow Templates
-  Add Jobs to Dataproc Workflow Template using Commands
-  Instantiate Dataproc Workflow Template to run the Data Pipeline
-  Overview of Dataproc Operations and Deleting Workflow Runs
-  Run and Validate ELT Data Pipeline using Dataproc
-  Stop Dataproc Cluster
13. Databricks :   Pyspark Processing in GCP
                
                  - Overview of Databricks on GCP
-  Signing up for Databricks on GCP
-  Create Databricks Workspace on GCP
-  Getting Started with Databricks Clusters on GCP
-  Getting Started with Databricks Notebook
-  High level architecture of Databricks
-  Setup Databricks CLI on Mac or Windows
-  Overview of Databricks CLI and other clients
-  Configure Databricks CLI on Mac or Windows
-  Troubleshoot issues to configure Databricks CLI
-  Overview of Databricks CLI Commands
-  Setup Data Repository for Data Sets
-  Setup Data Sets in DBFS using Databricks CLI Commands
-  Process Data in DBFS using Databricks Spark SQL
-  Getting Started with Spark SQL Example using Databricks
-  Create Temporary Views using Spark SQL
-  Exercise to create temporary views using Spark SQL
-  Spark SQL Query to compute Daily Product Revenue
-  Save Query Result to DBFS using Spark SQL
-  Overview of Pyspark Examples on Databricks.cmproj
 Process Schema Details in JSON using Pyspark
-  Create Dataframe with Schema from JSON File using Pyspark
-  Transform Data using Spark APIs
-  Get Schema Details for all Data Sets using Pyspark
-  Convert CSV to Parquet with Schema using Pyspark
14. ELT pipeline using Databricks
                
                  - Overview of Databricks Workflows
-  Pass Arguments to Databricks Python Notebooks
-  Pass Arguments to Databricks SQL Notebooks
-  Create and Run First Databricks Job
-  Run Databricks Jobs and Tasks with Parameters
-  Create and Run Orchestrated Pipeline using Databricks Job
-  Import ELT Data Pipeline Applications into Databricks Environment
-  Spark SQL Application to Cleanup Database and Datasets
-  Review File Format Converter Pyspark Code
-  Review Databricks SQL Notebooks for Tables and Final Results
-  Validate Applications for ELT Pipeline using Databricks
-  Build ELT Pipeline using Databricks Job in Workflows
-  Run and Review Execution details of ELT Data Pipeline using Databricks Job
-  Cleanup Databricks Environment on GCP
15. Integration of Spark on Dataproc and BigQuery 
                
                  - Review Development Environment with VS Code using Dataproc Cluster
-  Validate Google BigQuery Integration with Python on Dataproc
-  Setup Native Tables in Google BigQuery
-  Review Spark Google BigQuery Connector
-  Integration of Spark on Dataproc and BigQuery using Pyspark CLI
-  Integration of Spark on Dataproc and BigQuery using Notebook
-  Review Design of Data Pipeline using Spark and BigQuery
-  Review Spark Applications to compute daily product revenue
-  Create Table for Daily Product Revenue in Google BigQuery
-  Validate Parquet Files for Daily Product Revenue in GCS
-  Develop Logic to Save Daily Product Revenue to BigQuery Table
-  Reset Daily Product Revenue Table in Google BigQuery
-  Review Spark Application Code to Write to BigQuery Table
-  Submit Spark Application with BigQuery Integration using Client Mode
-  Submit Spark Application with BigQuery Integration using Cluster Mode
-  Deploy Spark Application with BigQuery Integration in GCS
-  Switching to Local Development Environment from Dataproc
-  Run Spark Application as Dataproc Job using Web 
-  Run Spark Application as Dataproc Job using Command
-  Review Dataproc Jobs and Spark Application using Dataproc UI
-  Overview of Orchestration using Dataproc Commands for Spark Applications on
-  Overview of ELT Pipeline using Dataproc Workflows
-  Create Workflow Template with Spark SQL Applications
-  Add Pyspark Application to Dataproc Workflow Template
-  Run Dataproc Workflow Template using Dataproc Command
-  Update Job Properties in Dataproc Workflow Template
16. DataFlow (Apache Beam Development) 
                
                  - Introduction to DataFlow
-  Use cases for DataFlow in real-time analytics and ETL.
-  Understanding the difference between Apache Spark and Apache Beam
-  How Dataflow is different from Dataproc
-  Building Data Pipelines with Apache Beam
                    
                      - Writing Apache Beam pipelines for batch and stream processing.
-  Custom Pipelines and Pre-defined pipelines
-  Transformations and windowing concepts. 
 
- Integration with Other GCP Services
                    
                    
                      - Integrating DataFlow with BigQuery, Pub/Sub, and other GCP services.
-  Real-time analytics and visualization using DataFlow and BigQuery.
-  Workflow orchestration with Composer.
 
- End to End Streaming Pipeline using Apache beam with Dataflow, Python app, PubSub, BigQuery, GCS
-  Template method of creating pipelines
17. Cloud Pub/Sub 
                
                  -  Introduction to Pub/Sub
-  Understanding the role of Pub/Sub in event-driven architectures.
-  Key Pub/Sub concepts: topics, subscriptions, messages, and acknowledgments.
-  Creating and Managing Topics and Subscriptions
                    
                      - Using the GCP Console to create Pub/Sub topics and subscriptions.
-  Configuring message retention policies and acknowledgment settings.
 
- Publishing and Consuming Messages
                    
                    
                      - Writing and deploying code to publish messages to a topic.
-  Implementing subscribers to consume and process messages from subscriptions.
 
- Integration with Other GCP Services
                    
                    
                      - Connecting Pub/Sub with Cloud Functions for serverless event-driven computing.
-  Integrating Pub/Sub with Dataflow for real-time stream processing.
 
- Streaming use-case using Dataflow
 
18. Google Cloud Composer : For Datapipeline Orchestration 
                
                  - Orchestration & Workflow Management  and DAG Creations
-  Cloud Composer (Airflow on GCP) – DAGs, operators, scheduling pipelines
-  Integration with Dataflow, Dataproc, BigQuery
-  Workflow automation with Cloud Functions & Workflows
-  Create Airflow or Cloud Composer Environment
-  Review Google Cloud Composer Environment
-  Development Process of Airflow DAGs for Cloud Composer
-  Install Required Dependencies for Development of Airflow DAGs
-  Run Airflow Commands in Cloud Composer using gcloud
-  Overview of Airflow Architecture
-  Deploy and Run First Airflow DAG in Google Cloud Composer Environment
-  Understand Relationship between Python Scripts and Airflow DAGs
-  Code Review of Airflow DAGs and Tasks
-  Overview of Airflow Dataproc Operators
-  Review Airflow DAG with GCP Dataproc Workflow Template Operator
-  Deploy and Run GCP Dataproc Workflow using Airflow
-  Using Variables in Airflow DAGs
-  Deploy and Run Airflow DAGs with Variables
-  Overview of Data Pipeline using Cloud Composer and Dataproc Jobs
-  Review the Spark Applications related to the Data Pipeline
-  Review Airflow DAG for Orchestrated Pipeline using Dataproc Jobs
-  Deploy Data Pipeline or Airflow DAG using Dataproc Jobs
-  Review Source and Target before Deployment of Airflow DAG
-  Deploy and Run Airflow DAG with Dataproc Jobs
-  Differences Between Dataproc Workflows and Airflow DAGs
-  Cleanup Cloud Composer Environment and Dataproc Cluster
19. Data Fusion: 
                
                  - Introduction to Data Fusion
                    
                      - Overview of Data Fusion as a fully managed data integration service.
-  Use cases for Data Fusion in ETL and data migration.
 
- Building Data Integration Pipelines
                    
                    
                      - Creating ETL pipelines using the visual interface.
-  Configuring data sources, transformations, and sinks.
-  Using pre-built templates for common integration scenarios.
 
- Integration with GCP and External Services
                    
                    
                      - Integrating Data Fusion with BigQuery, Cloud Storage, and other GCP services.
 
- End to End pipeline using Data fusion with Wrangler, GCS, BigQuery
20. Cloud Functions 
                
                  - Cloud Functions Introduction
-  Setting up Cloud Functions in GCP
-  Event-driven architecture and use cases
-  Writing and deploying Cloud Functions
-  Triggering Cloud Functions:
                    
                    
                      - HTTP triggers
-  Pub/Sub triggers
-  Cloud Storage triggers
 
- Monitoring and logging Cloud Functions
-  Usecase-1: Loading the files from GCS to BigQuery as soon as it is uploaded
21. Terraform: 
                
                  - Terraform Introduction
-  Installing and configuring Terraform.
-  Infrastructure Provisioning
-  Terraform basic commands
                    
                      - Init, plan, apply, destroy
 
- Create Resources in Google Cloud Platform
                    
                    
                      - GCS buckets
-  Dataproc cluster
-  BigQuery Datasets and tables
-  And more resources as needed
 
22. Datapipelines using DBT,Airflow and BigQuery 
                
                  - Overview of Data Landscape of Large Enterprise
-  DBT High Level Architecture
-  Overview of DBT Cloud Features and DBT Adapters
-  Airflow and DBT Pipeline Patterns
-  Pre-requisites for Dev Environment using Airflow and DBT
-  Setup Astro CLI on Windows or Mac
-  Setup Workspace using VSCode
-  Setup Local Airflow Environment using Astro CLI
-  Setup Python Virtual Environment with Airflow
-  Overview of Airflow Providers
-  Manage Local Airflow Containers using Astro CLI
-  Connect to Airflow Containers and Review Logs using Astro CLI
-  Setup Datasets for Airflow Pipelines or DAGs
-  Setup GCS Bucket and Upload Data Set
-  Getting Started with Google BigQuery
-  Create External Table using Google BigQuery
-  Create GCP Service Account and Download Credentials
-  Getting Started with DBT Cloud
-  Setup DBT Cloud Project for Google BigQuery
-  Review and Run Example DBT Pipeline using DBT Cloud
-  Validate Google BigQuery Objects created by DBT Pipeline
-  Overview of ELT Pipeline using DBT and Google BigQuery
-  Change the DBT Project Structure from example
-  Create Models for Orders and Order Items
-  Define Denormalized Model for Order Details
-  Query to compute daily product revenue
-  Add Model for Daily Product Revenue
-  Create and Run DBT Cloud Job
-  Validate Airflow and Review DBT Cloud Provider
-  Install Airflow DBT Cloud Provider
-  Overview of End to End Orchestrated Data Pipeline using Airflow
-  Create DBT Cloud Connection in Airflow
-  Create DBT Job Variables in Airflow
-  Develop Airflow DAG to trigger DBT Cloud Job
-  Deploy Airflow DAG with DBT Cloud
-  Run Airflow DAG with DBT Cloud Job
23. Machine Learning in GCP (for Data Engineers) 
                
                  - Overview of AI/ML on GCP
-  BigQuery ML – building ML models directly in SQL
-  Vertex AI basics – training & deploying ML models
-  Pipelines for ML (Vertex AI Pipelines, Kubeflow)
24. Security, Monitoring & Governance 
                
                  - Data encryption (at rest, in transit, CMEK vs Google-managed keys)
-  IAM roles for data services
-  VPC Service Controls for data security
-  Cloud Logging, Cloud Monitoring, and Cloud Trace
-  Data Catalog for metadata management & lineage
-  DLP (Data Loss Prevention) for sensitive data
25. Real-World GCP Data Engineering Scenarios 
                
                  - Building a streaming pipeline (Pub/Sub → Dataflow → BigQuery → Looker)
-  Building a batch pipeline (Cloud Storage → Dataproc → BigQuery)
-  Data migration from on-prem to GCP
-  Designing a hybrid data lakehouse (BigQuery + Dataplex + GCS)
-  Project flow