CCA 175 - Spark and Hadoop Developer Certification - Scala
4.1 (1,874 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
14,920 students enrolled

CCA 175 - Spark and Hadoop Developer Certification - Scala

Cloudera Certified Associate Spark and Hadoop Developer using Scala as Programming Language
4.1 (1,874 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
14,918 students enrolled
Last updated 5/2020
English
Italian [Auto-generated]
Current price: $125.99 Original price: $179.99 Discount: 30% off
5 hours left at this price!
30-Day Money-Back Guarantee
This course includes
  • 28 hours on-demand video
  • 5 downloadable resources
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
Training 5 or more people?

Get your team access to 4,000+ top Udemy courses anytime, anywhere.

Try Udemy for Business
What you'll learn
  • Entire curriculum of CCA Spark and Hadoop Developer
  • Apache Sqoop
  • HDFS Commands
  • Scala Fundamentals
  • Core Spark - Transformations and Actions
  • Spark SQL and Data Frames
  • Streaming analytics using Kafka, Flume and Spark Streaming
Requirements
  • Basic programming skills
  • Cloudera Quickstart VM or valid account for IT Versity Big Data labs or any Hadoop clusters where Hadoop, Hive and Spark are well integrated.
  • Minimum memory required based on the environment you are using with 64 bit operating system
Description

CCA 175 Spark and Hadoop Developer is one of the well recognized Big Data certification. This scenario based certification exam demands basic programming using Python or Scala along with Spark and other Big Data technologies.

This comprehensive course covers all aspects of the certification using Scala as programming language.

  • Scala Fundamentals

  • Core Spark - Transformations and Actions

  • Spark SQL and Data Frames

  • File formats

  • Flume, Kafka and Spark Streaming

  • Apache Sqoop

Exercises will be provided to prepare before attending the certification. Intention of the course is to boost the confidence to attend the certification.

All the demos are given on our state of the art Big Data cluster. You can avail one week complementary lab access by filling the form which is provided as part of the welcome message.

Who this course is for:
  • Any IT aspirant/professional willing to learn Big Data and give CCA 175 certification
Course content
Expand all 161 lectures 27:52:10
+ Scala Fundamentals
18 lectures 03:13:28
Setup Scala on Windows
07:23
Basic Programming Constructs
18:53
Functions
18:35
Object Oriented Concepts - Classes
17:42
Object Oriented Concepts - Objects
13:02
Object Oriented Concepts - Case Classes
11:14
Collections - Seq, Set and Map
08:56
Basic Map Reduce Operations
14:08
Setting up Data Sets for Basic I/O Operations
04:23
Basic I/O Operations and using Scala Collections APIs
16:23
Tuples
04:56
Development Cycle - Developing Source code
07:24
Development Cycle - Compile source code to jar using SBT
09:32
Development Cycle - Setup SBT on Windows
02:48
Development Cycle - Compile changes and run jar with arguments
04:21
Development Cycle - Setup IntelliJ with Scala
12:07
Development Cycle - Develop Scala application using SBT in IntelliJ
10:50
+ Getting Started
9 lectures 01:20:40
Introduction and Curriculum
05:45
Setup Environment - Options
01:45
Setup Environment - Locally
02:03
Setup Environment - using Cloudera Quickstart VM
07:22
Using Windows - Putty and WinSCP
10:33
Using Windows - Cygwin
14:46
HDFS Quick Preview
20:24
YARN Quick Preview
09:53
Setup Data Sets
08:09
+ Transform, Stage and Store - Spark
41 lectures 08:47:23
Introduction
05:15
Introduction to Spark
02:22
Setup Spark on Windows
23:15
Quick overview about Spark documentation
04:49
Initializing Spark job using spark-shell
18:39
Create Resilient Distributed Data Sets (RDD)
13:40
Previewing data from RDD
17:57
Reading different file formats - Brief overview using JSON
09:34
Transformations Overview
04:02
Manipulating Strings as part of transformations using Scala
13:44
Row level transformations using map
18:09
Row level transformations using flatMap
09:19
Filtering the data
18:03
Joining data sets - inner join
10:34
Joining data sets - outer join
17:29
Aggregations - Getting Started
04:07
Aggregations - using actions (reduce and countByKey)
15:14
Aggregations - understanding combiner
06:50
Aggregations using groupByKey - least preferred API for aggregations
21:13
Aggregations using reduceByKey
07:36
Aggregations using aggregateByKey
18:21
Sorting data using sortByKey
19:35
Global Ranking - using sortByKey with take and takeOrdered
12:47
By Key Ranking - Converting (K, V) pairs into (K, Iterable[V]) using groupByKey
06:21
Get topNPrices using Scala Collections API
10:49
Get topNPricedProducts using Scala Collections API
11:29
Get top n products by category using groupByKey, flatMap and Scala function
06:02
Set Operations - union, intersect, distinct as well as minus
19:39
Save data in Text Input Format
15:13
Save data in Text Input Format using Compression
11:28
Saving data in standard file formats - Overview
10:23
Revision of Problem Statement and Design the solution
04:12
Solution - Get Daily Revenue per Product - Launching Spark Shell
10:08
Solution - Get Daily Revenue per Product - Read and join orders and order_items
17:46
Solution - Get Daily Revenue per Product - Compute daily revenue per product id
13:41
Solution - Get Daily Revenue per Product - Read products data and create RDD
15:22

// Sort the data by date in ascending order and by daily revenue per product in descending order val dailyRevenuePerProductSorted = dailyRevenuePerProductJoin.   map(rec => ((rec._2._1._1, -rec._2._1._2), (rec._2._1._1, rec._2._1._2, rec._2._2))).   sortByKey() dailyRevenuePerProductSorted.take(100).foreach(println) //((order_date_asc, daily_revenue_per_product_id_desc), (order_date,daily_revenue_per_product,product_name))

// Get data to desired format – order_date,daily_revenue_per_product,product_name val dailyRevenuePerProduct = dailyRevenuePerProductSorted.   map(rec => rec._2._1 + "," + rec._2._2 + "," + rec._2._3) dailyRevenuePerProduct.take(10).foreach(println)

// Save final output into HDFS in avro file format as well as text file format // HDFS location – avro format /user/YOUR_USER_ID/daily_revenue_avro_scala // HDFS location – text format /user/YOUR_USER_ID/daily_revenue_txt_scala dailyRevenuePerProduct.saveAsTextFile("/user/dgadiraju/daily_revenue_txt_scala") sc.textFile("/user/dgadiraju/daily_revenue_txt_scala").take(10).foreach(println) // Copy both from HDFS to local file system // /home/YOUR_USER_ID/daily_revenue_scala mkdir daily_revenue_scala hadoop fs -get /user/dgadiraju/daily_revenue_txt_scala \ /home/dgadiraju/daily_revenue_scala/daily_revenue_txt_scala cd daily_revenue_scala/daily_revenue_txt_scala/ ls -ltr

Solution - Get Daily Revenue per Product - Sort and save to HDFS
26:17
Solution - Add spark dependencies to sbt
08:01
Solution - Develop as Scala based application
25:34
Solution - Run locally using spark-submit
09:03
Solution - Ship and run it on big data cluster
13:21
+ Data Analysis - Spark SQL or HiveQL - 1.6.x
21 lectures 03:58:10
Different interfaces to run Hive queries
09:25
Create Hive tables and load data in text file format
25:00
Create Hive tables and load data in ORC file format
10:18
Using spark-shell to run Hive queries or commands
03:51
Functions - Getting Started
05:11
Functions - Manipulating Strings
22:23
Functions - Manipulating Dates
13:44
Functions - Aggregations
05:49
Functions - CASE
14:10
Row level transformations
08:30
Joins
18:10
Aggregations
11:41
Sorting
07:27
Set Operations
05:39
Analytics Functions - Aggregations
15:53
Analytics Functions - Ranking
08:39
Windowing Functions
07:48
Create Data Frame and Register as Temp table
15:40
Writing Spark SQL Applications - process data
08:38
Writing Spark SQL Applications - Save data into Hive tables
07:20
Data Frame Operations
12:54
+ Setup Hadoop and Spark Environment for Practice
8 lectures 42:59
Introduction to Setting up Enviroment for Practice
03:09
Overview of ITVersity Boxes GitHub Repository
03:11
Creating Virtual Machine
10:31
Starting HDFS and YARN
04:28
Gracefully Stopping Virtual Machine
05:41
Undertanding Datasets provided in Virtual Machine
05:38
Using GitHub Content for the practice
05:11
Using Resources for Practice
05:10
+ Spark 2 - Data Processing - Overview
7 lectures 01:17:58
Introduction for the module
02:18
Starting Spark Context
10:14
Overview of Spark read APIs
18:16
Previewing Schema and Data
04:31
Overview of Data Frame APIs
07:41
Overview of Functions
18:15
Overview of Spark Write APIs
16:43
+ Spark 2 - Processing Column Data using Pre-defined Functions
18 lectures 02:15:46
Introduction to Pre-defined Functions
05:51
Creating Spark Session Object in Notebook
01:55
Create Dummy Data Frames for Practice
08:06
Categories of Functions
02:16
Using Special Functions - col
13:50
Using Special Functions - lit
04:44
String Manipulation Functions - Case Conversion and Length
06:44
String Manipulation - Extracting data from fixed lengith fields using substring
13:16
String Manipulation - Extracting data from delimited fields using split
09:04
String Manipulation - Concatenating Strings
03:37
String Manipulation - Padding Strings
11:10
String Manipulation - Trimming unwanted characters
05:23
Date and Time Functions - Overview
04:14
Date and Time Functions - Date Arithmetic
09:53
Date and Time Functions - Using trunc and date_trunc for to date reports
07:34
Date and Time Functions - Using date_format and other functions
15:33
Date and Time Functions - dealing with unix timestamp
08:13
Pre-defined Functions - Conclusion
04:23
+ Spark 2 - Basic Transformations using Data Frames
15 lectures 01:36:10
Introduction to Basic Transformations using Data Frame APIs
02:51
Starting Spark Context
03:13
Overview of Filtering
05:24
Filtering - Reading Data and Understanding Schema
02:30
Filtering Data - Task 1 - Equal Operator
08:19
Filtering Data - Task 2 - Comparison Operators
03:41
Filtering Data - Task 3 - Boolean AND
05:22
Filtering Data - Task 4 - IN Operator
05:43
Filtering Data - Task 5 - Between and Like
09:09
Filtering Data - Task 6 - Using functions in Filter
09:48
Overview of Aggregations
08:41
Overview of Sorting
02:52
Solution - Get Delayed Counts - Part 1
06:47
Solution - Get Delayed Counts - Part 2
05:22
Solution - Getting Delayed Counts By Date
16:28
+ Joining Data Sets
10 lectures 49:03
Prepare and Validate Data Sets
04:43
Starting Spark Session or Spark Context
03:28
Analyze Data Sets for Joins
06:11
Eliminate Duplicate records from Data Frame
04:11
Recap of Basic Transformations
04:27
Joining Data Sets - Problem Statements
02:11
Overview of Joins
01:43
Inner Join - Get number of flights departed from US airports
09:17
Inner Join - Get number of flights departed from US States
05:08
Outer Join - Get Aiports - Never Used
07:44