Udemy
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
Development
Web Development Data Science Mobile Development Programming Languages Game Development Database Design & Development Software Testing Software Engineering Development Tools No-Code Development
Business
Entrepreneurship Communications Management Sales Business Strategy Operations Project Management Business Law Business Analytics & Intelligence Human Resources Industry E-Commerce Media Real Estate Other Business
Finance & Accounting
Accounting & Bookkeeping Compliance Cryptocurrency & Blockchain Economics Finance Finance Cert & Exam Prep Financial Modeling & Analysis Investing & Trading Money Management Tools Taxes Other Finance & Accounting
IT & Software
IT Certification Network & Security Hardware Operating Systems Other IT & Software
Office Productivity
Microsoft Apple Google SAP Oracle Other Office Productivity
Personal Development
Personal Transformation Personal Productivity Leadership Career Development Parenting & Relationships Happiness Esoteric Practices Religion & Spirituality Personal Brand Building Creativity Influence Self Esteem & Confidence Stress Management Memory & Study Skills Motivation Other Personal Development
Design
Web Design Graphic Design & Illustration Design Tools User Experience Design Game Design Design Thinking 3D & Animation Fashion Design Architectural Design Interior Design Other Design
Marketing
Digital Marketing Search Engine Optimization Social Media Marketing Branding Marketing Fundamentals Marketing Analytics & Automation Public Relations Advertising Video & Mobile Marketing Content Marketing Growth Hacking Affiliate Marketing Product Marketing Other Marketing
Lifestyle
Arts & Crafts Beauty & Makeup Esoteric Practices Food & Beverage Gaming Home Improvement Pet Care & Training Travel Other Lifestyle
Photography & Video
Digital Photography Photography Portrait Photography Photography Tools Commercial Photography Video Design Other Photography & Video
Health & Fitness
Fitness General Health Sports Nutrition Yoga Mental Health Dieting Self Defense Safety & First Aid Dance Meditation Other Health & Fitness
Music
Instruments Music Production Music Fundamentals Vocal Music Techniques Music Software Other Music
Teaching & Academics
Engineering Humanities Math Science Online Education Social Science Language Teacher Training Test Prep Other Teaching & Academics
AWS Certification Microsoft Certification AWS Certified Solutions Architect - Associate AWS Certified Cloud Practitioner CompTIA A+ Cisco CCNA Amazon AWS CompTIA Security+ AWS Certified Developer - Associate
Graphic Design Photoshop Adobe Illustrator Drawing Digital Painting InDesign Character Design Canva Figure Drawing
Life Coach Training Neuro-Linguistic Programming Personal Development Mindfulness Personal Transformation Meditation Life Purpose Coaching Emotional Intelligence
Web Development JavaScript React CSS Angular PHP WordPress Node.Js Python
Google Flutter Android Development iOS Development Swift React Native Dart Programming Language Mobile Development Kotlin SwiftUI
Digital Marketing Google Ads (Adwords) Social Media Marketing Google Ads (AdWords) Certification Marketing Strategy Internet Marketing YouTube Marketing Email Marketing Retargeting
SQL Microsoft Power BI Tableau Business Analysis Business Intelligence MySQL Data Analysis Data Modeling Data Cleaning
Business Fundamentals Entrepreneurship Fundamentals Business Strategy Online Business Business Plan Startup Freelancing Blogging Home Business
Unity Game Development Fundamentals Unreal Engine C# 3D Game Development C++ 2D Game Development Unreal Engine Blueprints Blender
2020-12-06 04:29:19
30-Day Money-Back Guarantee

This course includes:

  • 28 hours on-demand video
  • Full lifetime access
  • Access on mobile and TV
IT & Software IT Certification Hadoop

CCA 175 - Spark and Hadoop Developer - Python (pyspark)

Cloudera Certified Associate Spark and Hadoop Developer using Python as Programming Language
Bestseller
Rating: 4.2 out of 54.2 (1,339 ratings)
6,929 students
Created by Durga Viswanatha Raju Gadiraju, Itversity Support, Hindu Varma Datla, Teja Rayala
Last updated 12/2020
English
Italian [Auto]
30-Day Money-Back Guarantee

What you'll learn

  • Entire curriculum of CCA Spark and Hadoop Developer
  • HDFS Commands
  • Python Fundamentals
  • Core Spark - Transformations and Actions
  • Spark SQL and Data Frames
Curated for the Udemy for Business collection

Requirements

  • Basic programming skills using any programming language
  • Cloudera Quickstart VM or valid account for IT Versity Big Data labs or any Hadoop clusters where Hadoop, Hive and Spark are well integrated.
  • Minimum memory required based on the environment you are using with 64 bit operating system
  • 4 GB RAM with access to proper clusters or 16 GB RAM with virtual machines such as Cloudera QuickStart VM

Description

CCA 175 Spark and Hadoop Developer is one of the well recognized Big Data certifications. This scenario-based certification exam demands basic programming using Python or Scala along with Spark and other Big Data technologies.

This comprehensive course covers all aspects of the certification using Python as a programming language.

  • Python Fundamentals

  • Spark SQL and Data Frames

  • File formats

Please note that the syllabus is recently changed and now the exam is primarily focused on Spark Data Frames and/or Spark SQL.

Exercises will be provided to prepare before attending the certification. The intention of the course is to boost the confidence to attend the certification.  

All the demos are given on our state of the art Big Data cluster. You can avail one-week complimentary lab access by filling this form which is provided as part of the welcome message.

Who this course is for:

  • Any IT aspirant/professional willing to learn Big Data and give CCA 175 certification

Featured review

Kushwanth Chowdary
Kushwanth Chowdary
38 courses
4 reviews
Rating: 5.0 out of 510 months ago
Anyone looking for PySpark experience I would recommend this will be Helpful with real time scenarios(use cases) and this might helps in Interviews as well. Also we can Quickly start with Lab and Platform to practice that really helps without wasting your time to setup the things and stuff like that.

Course content

17 sections • 214 lectures • 27h 55m total length

  • CCA 175 Spark and Hadoop Developer - Curriculum
    Preview08:01
  • Using labs for preparation
    08:55
  • Setup Development Environment (Windows 10) - Introduction
    02:25
  • Setup Development Environment - Python and Spark - Pre-requisites
    04:12
  • Setup Development Environment - Python Setup on Windows
    03:07
  • Setup Development Environment - Configure Environment Variables
    02:31
  • Setup Development Environment - Setup PyCharm for developing Python applications
    05:28
  • Setup Development Environment - Pass run time arguments or parameters
    02:31
  • Setup Development Environment - Download Spark compressed tar ball
    01:38
  • Setup Development Environment - Install 7z for uncompress and untar on windows
    01:00
  • Setup Development Environment - Setup Spark
    02:26
  • Setup Development Environment - Install JDK
    06:05
  • Setup Development Environment - Configure environment variables for Spark
    03:46
  • Setup Development Environment - Install WinUtils - integrate Windows and HDFS
    06:30
  • Setup Development Environment - Integrate PyCharm and Spark on Windows 10
    07:06

  • Introduction and Setting up Python
    09:43
  • Basic Programming Constructs
    13:16
  • Functions in Python
    14:05
  • Preview16:12
  • Map Reduce operations on Python Collections
    12:52
  • Setting up Data Sets for Basic I/O Operations
    04:23
  • Basic I/O operations and processing data using Collections
    16:35

  • Get revenue for given order id - as application
    12:20
  • Setup Environment - Options
    01:45
  • Setup Environment - Locally
    02:03
  • Setup Environment - using Cloudera Quickstart VM
    07:22
  • Using Itversity platforms - Big Data Developer labs and forum
    08:55
  • Using itversity's big data labs
    09:00
  • Using Windows - Putty and WinSCP
    10:33
  • Using Windows - Cygwin
    14:46
  • HDFS Quick Preview
    20:24
  • YARN Quick Preview
    09:53
  • Setup Data Sets
    08:09

  • Introduction
    06:05
  • Preview02:22
  • Setup Spark on Windows
    23:15
  • Quick overview about Spark documentation
    04:38
  • Connecting to the environment
    03:48
  • Initializing Spark job using pyspark
    04:54
  • Preview18:28
  • Create RDD from collection - using parallelize
    04:53
  • Read data from different file formats - using sqlContext
    08:05
  • Row level transformations - String Manipulation
    11:00
  • Row Level Transformations - map
    12:24
  • Row Level Transformations - flatMap
    05:50
  • Filtering data using filter
    10:09
  • Joining Data Sets - Introduction
    05:16
  • Joining Data Sets - Inner Join
    10:34
  • Joining Data Sets - Outer Join
    14:38
  • Aggregations - Introduction
    03:00
  • Aggregations - count and reduce - Get revenue for order id
    12:52
  • Aggregations - reduce - Get order item with minimum subtotal for order id
    05:47
  • Aggregations - countByKey - Get order count by status
    05:57
  • Aggregations - understanding combiner
    06:50
  • Aggregations - groupByKey - Get revenue for each order id
    08:17
  • groupByKey - Get order items sorted by order_item_subtotal for each order id
    11:59
  • Aggregations - reduceByKey - Get revenue for each order id
    10:26
  • Aggregations - aggregateByKey - Get revenue and count of items for each order id
    14:29
  • Sorting - sortByKey - Sort data by product price
    09:59
  • Sorting - sortByKey - Sort data by category id and then by price descending
    10:48
  • Ranking - Introduction
    01:18
  • Ranking - Global Ranking using sortByKey and take
    02:48
  • Ranking - Global using takeOrdered or top
    07:28
  • Ranking - By Key - Get top N products by price per category - Introduction
    03:53
  • Ranking - By Key - Get top N products by price per category - Python collections
    04:41
  • Ranking - By Key - Get top N products by price per category - using flatMap
    03:06
  • Ranking - By Key - Get top N priced products - Introduction
    03:00
  • Ranking - By Key - Get top N priced products - using Python collections API
    13:06
  • Ranking - By Key - Get top N priced products - Create Function
    05:03
  • Ranking - By Key - Get top N priced products - integrate with flatMap
    04:16
  • Set Operations - Introduction
    01:05
  • Set Operations - Prepare data
    08:22
  • Set Operations - union and distinct
    05:14
  • Set Operations - intersect and minus
    08:04
  • Saving data into HDFS - text file format
    11:46
  • Saving data into HDFS - text file format with compression
    05:51
  • Saving data into HDFS using Data Frames - json
    11:17

  • Problem Statement
    01:53
  • Launching pyspark
    11:45
  • Reading data from HDFS and filtering
    08:14
  • Joining orders and order_items
    07:44
  • Aggregate to get daily revenue per product id
    06:53
  • Load products and convert into RDD
    10:01
  • Join and sort the data
    11:38
  • Save to HDFS and validate in text file format
    07:24
  • Saving data in avro file format
    11:58
  • Get data to local file system using get or copyToLocal
    04:51
  • Develop as application to get daily revenue per product
    07:27
  • Run as application on the cluster
    05:07

  • Different interfaces to run SQL - Hive, Spark SQL
    09:25
  • Create database and tables of text file format - orders and order_items
    25:00
  • Create database and tables of ORC file format - orders and order_items
    10:18
  • Running SQL/Hive Commands using pyspark
    05:16
  • Functions - Getting Started
    05:11
  • Functions - String Manipulation
    22:23
  • Functions - Date Manipulation
    13:44
  • Functions - Aggregate Functions in brief
    05:49
  • Functions - case and nvl
    14:10
  • Row level transformations
    08:30
  • Joining data between multiple tables
    18:10
  • Group by and aggregations
    11:41
  • Sorting the data
    07:27
  • Set operations - union and union all
    05:39
  • Analytics functions - aggregations
    15:53
  • Analytics functions - ranking
    08:39
  • Windowing functions
    07:48
  • Creating Data Frames and register as temp tables
    18:46
  • Write Spark Application - Processing Data using Spark SQL
    09:13
  • Write Spark Application - Saving Data Frame to Hive tables
    09:35
  • Data Frame Operations
    13:41

  • Introduction to Setting up Enviroment for Practice
    03:09
  • Overview of ITVersity Boxes GitHub Repository
    03:11
  • Creating Virtual Machine
    10:31
  • Starting HDFS and YARN
    04:28
  • Gracefully Stopping Virtual Machine
    05:41
  • Undertanding Datasets provided in Virtual Machine
    05:38
  • Using GitHub Content for the practice
    05:11
  • Using Resources for Practice
    05:10

  • Introduction
    02:16
  • Review of Setup Steps for Spark Environment
    08:39
  • Using ITVersity labs
    06:32
  • Apache Spark Official Documentation (Very Important)
    07:20
  • Quick Review of Spark APIs
    12:30
  • Spark Modules
    05:01
  • Spark Data Structures - RDDs and Data Frames
    14:49
  • Develop Simple Application
    14:26
  • Apache Spark - Framework
    22:19

  • Introduction
    01:43
  • Data Frames - Overview
    12:22
  • Create Data Frames from Text Files
    16:18
  • Create Data Frames from Hive Tables
    05:49
  • Create Data Frames using JDBC
    17:14
  • Data Frame Operations - Overview
    09:02
  • Spark SQL - Overview
    04:00
  • Overview of Functions to manipulate data in Data Frame fields or columns
    05:52

  • Define Problem Statement - Get Daily Product Revenue
    06:54
  • Selection or Projection of Data in Data Frames
    10:27
  • Filtering Data from Data Frames
    16:32
  • Joining multiple Data Frames
    17:55
  • Perform Aggregations using Data Frames
    12:24
  • Sorting Data in Data Frames
    10:23
  • Development Life Cycle using Data Frames
    14:36
  • Run applications using Spark Submit
    08:57

Instructors

Durga Viswanatha Raju Gadiraju
Technology Adviser and Evangelist
Durga Viswanatha Raju Gadiraju
  • 4.2 Instructor Rating
  • 9,067 Reviews
  • 167,342 Students
  • 18 Courses

13+ years of experience in executing complex projects using vast array of technologies including Big Data and Cloud.

ITVersity, Inc. - a US based organization to provide quality training for IT professionals and we have the track record of training hundreds of thousands of professionals globally.

Building IT career for people with required tools such as high quality material, labs, live support etc to upskill and cross skill is paramount for our organization.

At this time our training offerings are focused on following areas:

* Application Development using Python and SQL

* Big Data and Business Intelligence

* Cloud

* Datawarehousing, Databases

Itversity Support
Support Account for ITVersity Courses.
Itversity Support
  • 4.2 Instructor Rating
  • 8,424 Reviews
  • 151,567 Students
  • 15 Courses

We have built a team to support going forward. If you send messages to this account for our courses, they will be sent to our Helpdesk from where we will be rewriting to our team.

Hindu Varma Datla
Software Engineer at ITVersity
Hindu Varma Datla
  • 4.3 Instructor Rating
  • 3,862 Reviews
  • 45,956 Students
  • 4 Courses

3+ years of IT Experience in the areas of Python using Django as well as Flask, Spark, Linux, SQL using any RDBMS, Java Script, Node JS, Mongo DB etc.

I will be primarily providing support for Python, SQL and other related courses as co-instructor to ITVersity courses.

ITVersity, Inc. - a US based organisation to provide quality training for IT professionals and we have the track record of training hundreds of thousands of professionals globally.

Building IT career for people with required tools such as high quality material, labs, live support etc to up skill and cross skill is paramount for our organisation.

At this time our training offerings are focused on following areas:

* Application Development using Python and SQL

* Big Data and Business Intelligence

* Cloud

* Data Warehousing, Databases

Teja Rayala
Software Engineer at ITVersity Inc.
TR
  • 4.3 Instructor Rating
  • 3,683 Reviews
  • 23,173 Students
  • 3 Courses

Experienced Data Engineer with a demonstrated history of working in the consumer goods industry. Skilled in Apache Airflow, Apache Kafka, Hive, Apache Spark, and Amazon Web Services (AWS). Strong information technology professional with a Master's degree focused in Analytics from University of Cincinnati.

ITVersity, Inc. is a US-based organisation providing quality training for IT professionals and we have a track record of training hundreds of thousands of professionals globally.

Helping build IT careers of people with high-quality content, Labs, live support etc. to upskill and cross-skill is paramount for our organisation.

I will be overseeing the support for ITVersity courses related to Data Engineering and DevOps Engineering

At this time our training offerings are focused on the following areas:

* Application Development using Python and SQL

* Big Data and Business Intelligence

* Cloud

* Data Warehousing, Databases

  • Udemy for Business
  • Teach on Udemy
  • Get the app
  • About us
  • Contact us
  • Careers
  • Blog
  • Help and Support
  • Affiliate
  • Terms
  • Privacy policy
  • Cookie settings
  • Sitemap
  • Featured courses
Udemy
© 2021 Udemy, Inc.