CCA 175 - Spark and Hadoop Developer - Python (pyspark)
4.3 (1,059 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
4,750 students enrolled

CCA 175 - Spark and Hadoop Developer - Python (pyspark)

Cloudera Certified Associate Spark and Hadoop Developer using Python as Programming Language
Bestseller
4.3 (1,059 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
4,750 students enrolled
Last updated 5/2020
English
Italian [Auto]
Current price: $69.99 Original price: $99.99 Discount: 30% off
5 hours left at this price!
30-Day Money-Back Guarantee
This course includes
  • 23 hours on-demand video
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
Training 5 or more people?

Get your team access to 4,000+ top Udemy courses anytime, anywhere.

Try Udemy for Business
What you'll learn
  • Entire curriculum of CCA Spark and Hadoop Developer
  • Apache Sqoop
  • HDFS Commands
  • Python Fundamentals
  • Core Spark - Transformations and Actions
  • Spark SQL and Data Frames
  • Streaming analytics using Kafka, Flume and Spark Streaming
Requirements
  • Basic programming skills using any programming language
  • Cloudera Quickstart VM or valid account for IT Versity Big Data labs or any Hadoop clusters where Hadoop, Hive and Spark are well integrated.
  • Minimum memory required based on the environment you are using with 64 bit operating system
  • 4 GB RAM with access to proper clusters or 16 GB RAM with virtual machines such as Cloudera QuickStart VM
Description

CCA 175 Spark and Hadoop Developer is one of the well recognized Big Data certifications. This scenario-based certification exam demands basic programming using Python or Scala along with Spark and other Big Data technologies.

This comprehensive course covers all aspects of the certification using Python as a programming language.

  • Python Fundamentals

  • Spark SQL and Data Frames

  • File formats

Please note that the syllabus is recently changed and now the exam is primarily focused on Spark Data Frames and/or Spark SQL.

Exercises will be provided to prepare before attending the certification. The intention of the course is to boost the confidence to attend the certification.  

All the demos are given on our state of the art Big Data cluster. You can avail one-week complimentary lab access by filling this form which is provided as part of the welcome message.

Who this course is for:
  • Any IT aspirant/professional willing to learn Big Data and give CCA 175 certification
Course content
Expand all 158 lectures 22:46:43
+ Introduction
15 lectures 01:05:41
Using labs for preparation
08:55
Setup Development Environment (Windows 10) - Introduction
02:25
Setup Development Environment - Python and Spark - Pre-requisites
04:12
Setup Development Environment - Python Setup on Windows
03:07
Setup Development Environment - Configure Environment Variables
02:31
Setup Development Environment - Setup PyCharm for developing Python applications
05:28
Setup Development Environment - Pass run time arguments or parameters
02:31
Setup Development Environment - Download Spark compressed tar ball
01:38
Setup Development Environment - Install 7z for uncompress and untar on windows
01:00
Setup Development Environment - Setup Spark
02:26
Setup Development Environment - Install JDK
06:05
Setup Development Environment - Configure environment variables for Spark
03:46
Setup Development Environment - Install WinUtils - integrate Windows and HDFS
06:30
Setup Development Environment - Integrate PyCharm and Spark on Windows 10
07:06
+ Python Fundamentals
8 lectures 01:39:26
Introduction and Setting up Python
09:43
Basic Programming Constructs
13:16
Functions in Python
14:05
Map Reduce operations on Python Collections
12:52
Setting up Data Sets for Basic I/O Operations
04:23
Basic I/O operations and processing data using Collections
16:35
Get revenue for given order id - as application
12:20
+ Getting Started
10 lectures 01:32:50
Setup Environment - Options
01:45
Setup Environment - Locally
02:03
Setup Environment - using Cloudera Quickstart VM
07:22
Using Itversity platforms - Big Data Developer labs and forum
08:55
Using itversity's big data labs
09:00
Using Windows - Putty and WinSCP
10:33
Using Windows - Cygwin
14:46
HDFS Quick Preview
20:24
YARN Quick Preview
09:53
Setup Data Sets
08:09
+ Apache Spark 1.6 - Transform, Stage and Store
44 lectures 05:47:01
Introduction
06:05
Setup Spark on Windows
23:15
Quick overview about Spark documentation
04:38
Connecting to the environment
03:48
Initializing Spark job using pyspark
04:54
Create RDD from collection - using parallelize
04:53
Read data from different file formats - using sqlContext
08:05
Row level transformations - String Manipulation
11:00
Row Level Transformations - map
12:24
Row Level Transformations - flatMap
05:50
Filtering data using filter
10:09
Joining Data Sets - Introduction
05:16
Joining Data Sets - Inner Join
10:34
Joining Data Sets - Outer Join
14:38
Aggregations - Introduction
03:00
Aggregations - count and reduce - Get revenue for order id
12:52
Aggregations - reduce - Get order item with minimum subtotal for order id
05:47
Aggregations - countByKey - Get order count by status
05:57
Aggregations - understanding combiner
06:50
Aggregations - groupByKey - Get revenue for each order id
08:17
groupByKey - Get order items sorted by order_item_subtotal for each order id
11:59
Aggregations - reduceByKey - Get revenue for each order id
10:26
Aggregations - aggregateByKey - Get revenue and count of items for each order id
14:29
Sorting - sortByKey - Sort data by product price
09:59
Sorting - sortByKey - Sort data by category id and then by price descending
10:48
Ranking - Introduction
01:18
Ranking - Global Ranking using sortByKey and take
02:48
Ranking - Global using takeOrdered or top
07:28
Ranking - By Key - Get top N products by price per category - Introduction
03:53
Ranking - By Key - Get top N products by price per category - Python collections
04:41
Ranking - By Key - Get top N products by price per category - using flatMap
03:06
Ranking - By Key - Get top N priced products - Introduction
03:00
Ranking - By Key - Get top N priced products - using Python collections API
13:06
Ranking - By Key - Get top N priced products - Create Function
05:03
Ranking - By Key - Get top N priced products - integrate with flatMap
04:16
Set Operations - Introduction
01:05
Set Operations - Prepare data
08:22
Set Operations - union and distinct
05:14
Set Operations - intersect and minus
08:04
Saving data into HDFS - text file format
11:46
Saving data into HDFS - text file format with compression
05:51
Saving data into HDFS using Data Frames - json
11:17
+ Apache Spark 1.6 - Core Spark APIs - Get Daily Revenue Per Product
12 lectures 01:34:55
Problem Statement
01:53
Launching pyspark
11:45
Reading data from HDFS and filtering
08:14
Joining orders and order_items
07:44
Aggregate to get daily revenue per product id
06:53
Load products and convert into RDD
10:01
Join and sort the data
11:38
Save to HDFS and validate in text file format
07:24
Saving data in avro file format
11:58
Get data to local file system using get or copyToLocal
04:51
Develop as application to get daily revenue per product
07:27
Run as application on the cluster
05:07
+ Apache Spark 1.6 - Data Analysis - Spark SQL or HiveQL using Spark Context
21 lectures 04:06:18
Different interfaces to run SQL - Hive, Spark SQL
09:25
Create database and tables of text file format - orders and order_items
25:00
Create database and tables of ORC file format - orders and order_items
10:18
Running SQL/Hive Commands using pyspark
05:16
Functions - Getting Started
05:11
Functions - String Manipulation
22:23
Functions - Date Manipulation
13:44
Functions - Aggregate Functions in brief
05:49
Functions - case and nvl
14:10
Row level transformations
08:30
Joining data between multiple tables
18:10
Group by and aggregations
11:41
Sorting the data
07:27
Set operations - union and union all
05:39
Analytics functions - aggregations
15:53
Analytics functions - ranking
08:39
Windowing functions
07:48
Creating Data Frames and register as temp tables
18:46
Write Spark Application - Processing Data using Spark SQL
09:13
Write Spark Application - Saving Data Frame to Hive tables
09:35
Data Frame Operations
13:41
+ Setup Hadoop and Spark Environment for Practice
8 lectures 42:59
Introduction to Setting up Enviroment for Practice
03:09
Overview of ITVersity Boxes GitHub Repository
03:11
Creating Virtual Machine
10:31
Starting HDFS and YARN
04:28
Gracefully Stopping Virtual Machine
05:41
Undertanding Datasets provided in Virtual Machine
05:38
Using GitHub Content for the practice
05:11
Using Resources for Practice
05:10
+ Apache Spark 2.x - Data processing - Getting Started
9 lectures 01:33:52
Introduction
02:16
Review of Setup Steps for Spark Environment
08:39
Using ITVersity labs
06:32
Apache Spark Official Documentation (Very Important)
07:20
Quick Review of Spark APIs
12:30
Spark Modules
05:01
Spark Data Structures - RDDs and Data Frames
14:49
Develop Simple Application
14:26
Apache Spark - Framework
22:19
+ Apache Spark 2.x - Data Frames and Pre-Defined Functions
8 lectures 01:12:20
Introduction
01:43
Data Frames - Overview
12:22
Create Data Frames from Text Files
16:18
Create Data Frames from Hive Tables
05:49
Create Data Frames using JDBC
17:14
Data Frame Operations - Overview
09:02
Spark SQL - Overview
04:00
Overview of Functions to manipulate data in Data Frame fields or columns
05:52
+ Apache Spark 2.x - Processing Data using Data Frames - Basic Transformations
8 lectures 01:38:08
Define Problem Statement - Get Daily Product Revenue
06:54
Selection or Projection of Data in Data Frames
10:27
Filtering Data from Data Frames
16:32
Joining multiple Data Frames
17:55
Perform Aggregations using Data Frames
12:24
Sorting Data in Data Frames
10:23
Development Life Cycle using Data Frames
14:36
Run applications using Spark Submit
08:57