DURGA SOFTWARE SOLUTIONS

Courses Offered:

SCJP SCWCD Design patterns EJB CORE JAVA AJAX Adv. Java XML STRUTS Web services SPRING HIBERNATE

GCP DATA ENGINEERING Course Details

Subcribe and Access : 5200+ FREE Videos and 21+ Subjects Like CRT, SoftSkills, JAVA, Hadoop, Microsoft .NET, Testing Tools etc..

Batch Date: July 11th @7:00AM

Faculty: Mr. Shaik Saidhul (7+ Yrs of Exp,.. & Real Time Expert)

(Google Certified Professional Data Engineer)

Duration: 3 Months

Venue :
DURGA SOFTWARE SOLUTIONS,
Flat No : 202, 2nd Floor,
HUDA Maitrivanam,
Ameerpet, Hyderabad - 500038

Ph.No: +91 - 8885252627, 9246212143, 80 96 96 96 96

Syllabus:

Google Cloud Data Engineering + AI Fundamentals
Industry-Focused Training with Real-World Projects

GCP Cloud Basics

GCP Introduction

The need for cloud computing in modern businesses.

Key features and offerings of Google Cloud Platform (GCP).

Overview of core GCP services and products.

Benefits and advantages of using cloud infrastructure.

Step-by-step guide to creating a free-tier account on GCP.

GCP Interfaces

Console

Navigating the GCP Console

Configuring the GCP Console for Efficiency

Using the GCP Console for Service Management

Shell

Introduction to GCP Shell

Command-line Interface (CLI) Basics

GCP Shell Commands for Service Deployment and Management

SDK

Overview of GCP Software Development Kits (SDKs)

Installing and Configuring SDKs

Writing and Executing GCP SDK Commands

GCP Locations

Regions

Understanding GCP Regions

Selecting Regions for Service Deployment

Impact of Region on Service Performance

Zones

Exploring GCP Zones

Distributing Resources Across Zones

High Availability and Disaster Recovery Considerations

Importance

Significance of Choosing the Right Location

Global vs. Regional Resources

Factors Influencing Location Decisions

GCP IAM & Admin

Identities

Introduction to Identity and Access Management (IAM)

Users, Groups, and Service Accounts

Best Practices for Identity Management

Roles

GCP IAM Roles Overview

Defining Custom Roles

Role-Based Access Control (RBAC) Implementation

Policy

Resource-based Policies

Understanding and Implementing Organization Policies

Auditing and Monitoring Policies

Resource Hierarchy

GCP Resource Hierarchy Structure

Managing Resources in a Hierarchy

Organizational Structure Best Practices

Linux Basics on Cloud Shell

Getting started with Linux

Linux Installation

Basic Linux Commands

Cloud shell tips

File and Directory Operations
(ls, cd, pwd, mkdir, rmdir, cp, mv, touch, rm, nano)

File Content Manipulation (cat, less, head, tail, grep)

Text Processing (awk, sed, cut, sort, uniq)

User and Permission related (whoami, id, su, sudo, chmod, chown)

Python for Data Engineer

Data Types

Strings

Operators

Numbers (Int, Float)

Booleans

Data Structures

Lists

Tuples

Dictionaries

Sets

Python Programming Constructs

if, elif, else statements

for loops, while loops

Exception Handling

File I/O operations

Modular Programming in Python

Functions & Lambda Functions

Classes

GCP Data Engineering Tools

Google Cloud Storage

Overview of Cloud Storage as a scalable and durable object storage service.

Understanding buckets and objects in Cloud Storage.

Use cases for Cloud Storage, such as data backup, multimedia storage, and
website content

Labs: using console & CLI to do below

Creating and managing Cloud Storage buckets.

Uploading and downloading objects to and from Cloud Storage.

Setting access controls and permissions for buckets and objects.

Data Transfer and Lifecycle Management

Versioning and Object Versioning

Integration with Other GCP Services

Monitoring and logging for Cloud Storage operations.

Cloud SQL

Introduction to Cloud SQL

Creating and Managing Cloud SQL Instances

Configuring database settings, users, and access controls.

Connecting to Cloud SQL instances using Cloud SQL studio, Shell, Workbenches

Importing and exporting data in Cloud SQL.

Backups and High Availability

Integration with Other GCP Services

Managing database user roles and permissions.

Introduction to DMS

End to End Database migration Project

Manual: Export and Import method

Automation: Cloud SQL DMS method

BigQuery (SQL Development)

Introduction to BigQuery

BigQuery Architecture

Use cases for BigQuery in business intelligence and analytics.

Various method of creating table in BigQuery

BigQuery Data Sources and File Formats

Native table and External Tables

Working with Complex Data Types

Working json data, nested, repeated and array data

Data Integration and Export

Loading data into BigQuery from Cloud Storage, Cloud SQL, and other sources.

Exporting data from BigQuery to various formats.

Real-time data streaming into BigQuery.

Configuring access controls and permissions in BigQuery.

BigQuery Views:

Views

Materialized Views

Authorized Views

Optimization techniques in BigQuery

BigQuery Slots – on demand, flat-rate, flex-slots

Case Study-1: implement a real-world analytics data platform for Spotify

Case Study-2: Enterprise Social Media analytics platform

DataProc (Pyspark Development)

Introduction to Hadoop and Apache Spark

Understanding the difference between Spark and MapReduce

What is Spark and Pyspark.

Understanding Spark framework and its functionalities

Overview of DataProc as a fully managed Apache Spark and Hadoop service.

Cluster Creation and Configuration

Creating and managing DataProc clusters.

Configuring cluster properties for performance and scalability.

Preemptible instances and cost optimization.

Learning Pyspark:

How to read from multiple data sources – csv, text, json, parquet, database table, BigQuery tables

How to perform multiple transformations

How to write to multiple targets - csv, text, json, parquet, database table, BigQuery tables

Running Jobs on DataProc

Submitting and monitoring Spark and Hadoop jobs on DataProc.

Use of initialization actions and custom scripts.

Job debugging and troubleshooting.

Case study-1: Data Cleaning of Employee Travel Records

Case study-2: Processing real-time patient health data

Case study-3: Creating a pyspark job to support ML model creations

DataFlow (Apache Beam development)

Introduction to DataFlow

Use cases for DataFlow in real-time analytics and ETL.

Understanding the difference between Apache Spark and Apache Beam

How Dataflow is different from Dataproc

Learning Apache Beam

How to read from multiple data sources – csv, text, json, parquet, database table, BigQuery tables

How to perform multiple transformations

How to write to multiple targets - csv, text, json, parquet, database table, BigQuery tables

Case study-1: Template method of creating pipelines

Case study-2: E-commerce Transaction Processing

Case study-3: End to End Streaming Pipeline using Apache beam with Dataflow, Python app, PubSub, BigQuery, GCS

Cloud Pub/Sub (Streaming)

Introduction to Pub/Sub

Understanding the role of Pub/Sub in event-driven architectures.

Key Pub/Sub concepts: topics, subscriptions, messages, and acknowledgments.

Creating and Managing Topics and Subscriptions

Using the GCP Console to create Pub/Sub topics and subscriptions.

Configuring message retention policies and acknowledgment settings.

Publishing and Consuming Messages

Writing and deploying code to publish messages to a topic.

Implementing subscribers to consume and process messages from subscriptions.

Case study-1: Streaming use-case using Dataflow

Cloud Composer (DAG Creations)

Introduction to Composer/Airflow

Overview of Airflow Architecture

Use cases for Composer in managing and scheduling workflows.

Creating and Managing Workflows

Creating and configuring Composer environments.

Defining and scheduling workflows using Apache Airflow.

Monitoring and managing workflow executions.

Integration with Data Engineering Services

Orchestrating workflows involving BigQuery, DataFlow, and other services.

Coordinating ETL processes with Composer.

Integrating with external systems and APIs.

Error Handling and Troubleshooting

Handling errors and retries in Composer workflows.

Debugging and troubleshooting failed workflow executions.

Logging and monitoring for Composer workflows.

Level-1-DAG: Orchestrating the BigQuery pipelines

Level-2-DAG: Orchestrating the DataProc pipelines

Level-3-DAG: Orchestrating the Dataflow pipelines

Deploy DAGs: Implementing CI/CD in Composer Using Cloud Build and GitHub

Databricks on GCP

What is Lakehouse Architecture?

Difference Between:

Data Lake

Data Warehouse

Data Lakehouse

Introduction to Databricks

Overview of Databricks

Databricks Architecture

Setting up a Databricks Workspace

Unity Catalog - Unified Data Governance

Introduction to Unity Catalog

Core Concepts: Metastore, Catalog, Schema, Tables, Volumes, Functions, Models

External Data Access

Storage Credentials

External Locations (S3, GCS, ADLS)

Lakehouse Federation - Foreign Catalog, Connection

Delta Sharing – Share, Recipient, Provider

Databricks Clusters

Introduction to Clusters

Types of Clusters

Cluster Modes

DBUtils Commands - File handling, Notebook Widgets, Secrets

Introduction to Delta Lake

What is Delta Lake?

Creating Delta Lake Tables - Managed Tables, External Tables

Understanding Delta Lake Table Creation - Delta Log, Parquet Data Files

Advanced Delta Lake Features

Versioning

Time Travel

Compacting

Liquid clustering

Vacuum

Databricks Jobs & Pipelines

Creating a Job

Scheduling Jobs

Setting Parameters

Managing Dependencies

Setting up Alerts

Case Study-1: Structured Streaming with Databricks

Case Study-2: Incremental Data Loading with Auto Loader

Data Fusion (Complementary)

Introduction to Data Fusion

Overview of Data Fusion as a fully managed data integration service.

Use cases for Data Fusion in ETL and data migration.

Building Data Integration Pipelines

Creating ETL pipelines using the visual interface.

Configuring data sources, transformations, and sinks.

Using pre-built templates for common integration scenarios.

Integration with GCP and External Services

Integrating Data Fusion with BigQuery, Cloud Storage, and other GCP services.

Case Study-1: End to End pipeline using Data fusion with Wrangler, GCS, BigQuery

Cloud Functions (Complementary)

Cloud Functions Introduction

Setting up Cloud Functions in GCP

Event-driven architecture and use cases

Writing and deploying Cloud Functions

Triggering Cloud Functions:

HTTP triggers

Pub/Sub triggers

Cloud Storage triggers

Monitoring and logging Cloud Functions

Usecase-1: Loading the files from GCS to BigQuery as soon as it is uploaded.

Terraform (Complementary)

Terraform Introduction

Installing and configuring Terraform.

Infrastructure Provisioning

Terraform basic commands

Init, plan, apply, destroy

Labs: Create Resources in Google Cloud Platform

GCS buckets

Dataproc cluster

BigQuery Datasets and tables

And more resources as needed

AI Fundamentals for GCP Data Engineers (Current Market Demand)

Introduction to Artificial Intelligence & Generative AI

Understanding AI, ML, Deep Learning, Generative AI, and LLMs.

Real-world AI use cases in Data Engineering and Analytics.

Machine Learning Fundamentals

Supervised, Unsupervised, and Reinforcement Learning.

Training, validation, testing, and model evaluation concepts.

Prompt Engineering for Data Engineers

Designing effective prompts for Gemini, ChatGPT, and Vertex AI.

Using AI for SQL generation, code generation, documentation, and data analysis.

Google Vertex AI Fundamentals

Overview of Vertex AI ecosystem.

Working with foundation models, Model Garden, and Gemini APIs.

AI-Powered Data Pipelines

Integrating AI services with BigQuery, GCS, Dataflow, and Dataproc.

Building intelligent ETL/ELT pipelines using AI-driven transformations.

Complementary End-to-End Projects:

Healthcare project on GCP of 8+ hours

Road Traffic project on Databricks of 5+ hours

By the End of the course What Students can Expect

Proficient in SQL Development:

Mastering SQL for querying and manipulating data within Google BigQuery and Cloud SQL.

Writing complex queries and optimizing performance for large-scale datasets.

Understanding schema design and best practices for efficient data storage.

Pyspark Development Skills:

Proficiency in using PySpark for large-scale data processing on Google Cloud.

Developing and optimizing Spark jobs for distributed data processing.

Understanding Spark's RDDs, DataFrames, and transformations for data manipulation.

Apache Beam Development Mastery:

Creating data processing pipelines using Apache Beam.

Understanding the concepts of parallel processing and data parallelism.

Implementing transformations and integrating with other GCP services.

DAG Creations with Cloud Composer:

Designing and implementing Directed Acyclic Graphs (DAGs) for orchestrating workflows.

Using Cloud Composer for workflow automation and managing dependencies.

Developing DAGs that integrate various GCP services for end-to-end data processing.

Notebooks, Workflows with Databricks:

Understand how to build and manage data pipelines using Databricks and Delta Lake.

Efficiently query and analyze large datasets with Databricks SQL and Apache Spark.

Implement scalable workflows and optimize performance within Databricks.

Architecture Planning:

Proficient in architecting end-to-end data solutions on GCP.

Understanding the principles of designing scalable, reliable, and cost-effective data architectures.

Certification Readiness

Prepare for the Google Cloud Professional Data Engineer (PDE) and

Associate Cloud Engineer (ACE) certifications through a combination of theoretical knowledge and hands-on experience.

ML & AI ready

Students will understand how modern AI and Generative AI integrate with Google Cloud Data Engineering solutions and will be able to build AI-enabled data platforms using BigQuery, Vertex AI, and Gemini.

Course Highlights:

Hands-on Labs & Real-world Use Cases

Live Q&A Sessions

Course Completion Certificate

Access to Study Materials & Code Repository

Industry-Standard Best Practices

The course will empower students with practical skills in SQL, PySpark, Apache Beam, DAG creations, and architecture planning, ensuring they are well-prepared to tackle real-world data engineering challenges and successfully obtain GCP certifications.