Udemy
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
Development
Web Development Data Science Mobile Development Programming Languages Game Development Database Design & Development Software Testing Software Engineering Development Tools No-Code Development
Business
Entrepreneurship Communications Management Sales Business Strategy Operations Project Management Business Law Business Analytics & Intelligence Human Resources Industry E-Commerce Media Real Estate Other Business
Finance & Accounting
Accounting & Bookkeeping Compliance Cryptocurrency & Blockchain Economics Finance Finance Cert & Exam Prep Financial Modeling & Analysis Investing & Trading Money Management Tools Taxes Other Finance & Accounting
IT & Software
IT Certification Network & Security Hardware Operating Systems Other IT & Software
Office Productivity
Microsoft Apple Google SAP Oracle Other Office Productivity
Personal Development
Personal Transformation Personal Productivity Leadership Career Development Parenting & Relationships Happiness Esoteric Practices Religion & Spirituality Personal Brand Building Creativity Influence Self Esteem & Confidence Stress Management Memory & Study Skills Motivation Other Personal Development
Design
Web Design Graphic Design & Illustration Design Tools User Experience Design Game Design Design Thinking 3D & Animation Fashion Design Architectural Design Interior Design Other Design
Marketing
Digital Marketing Search Engine Optimization Social Media Marketing Branding Marketing Fundamentals Marketing Analytics & Automation Public Relations Advertising Video & Mobile Marketing Content Marketing Growth Hacking Affiliate Marketing Product Marketing Other Marketing
Lifestyle
Arts & Crafts Beauty & Makeup Esoteric Practices Food & Beverage Gaming Home Improvement Pet Care & Training Travel Other Lifestyle
Photography & Video
Digital Photography Photography Portrait Photography Photography Tools Commercial Photography Video Design Other Photography & Video
Health & Fitness
Fitness General Health Sports Nutrition Yoga Mental Health Dieting Self Defense Safety & First Aid Dance Meditation Other Health & Fitness
Music
Instruments Music Production Music Fundamentals Vocal Music Techniques Music Software Other Music
Teaching & Academics
Engineering Humanities Math Science Online Education Social Science Language Teacher Training Test Prep Other Teaching & Academics
AWS Certification Microsoft Certification AWS Certified Solutions Architect - Associate AWS Certified Cloud Practitioner CompTIA A+ Cisco CCNA Amazon AWS CompTIA Security+ AWS Certified Developer - Associate
Graphic Design Photoshop Adobe Illustrator Drawing Digital Painting InDesign Character Design Canva Figure Drawing
Life Coach Training Neuro-Linguistic Programming Personal Development Mindfulness Meditation Personal Transformation Life Purpose Emotional Intelligence CBT
Web Development JavaScript React CSS Angular PHP WordPress Node.Js Python
Google Flutter Android Development iOS Development Swift React Native Dart Programming Language Mobile Development Kotlin SwiftUI
Digital Marketing Google Ads (Adwords) Social Media Marketing Google Ads (AdWords) Certification Marketing Strategy Internet Marketing YouTube Marketing Email Marketing Google Analytics
SQL Microsoft Power BI Tableau Business Analysis Business Intelligence MySQL Data Modeling Data Analysis Big Data
Business Fundamentals Entrepreneurship Fundamentals Business Strategy Online Business Business Plan Startup Blogging Freelancing Home Business
Unity Game Development Fundamentals Unreal Engine C# 3D Game Development C++ 2D Game Development Unreal Engine Blueprints Blender
30-Day Money-Back Guarantee
IT & Software IT Certification Apache Spark

Taming Big Data using Spark & Python

Working on Big Data Projects & writing CCA 175 Made Easy with project scenarios & Practice questions for CCA 175
Rating: 0.0 out of 50.0 (0 ratings)
3 students
Created by Anshul Roy
Published 3/2019
English
English [Auto]
30-Day Money-Back Guarantee

What you'll learn

  • Big Data and its EcoSystem like Hadoop , Sqoop, Hive, Flume, Kafka, Spark using Python, Spark SQL & Spark Streaming
  • Both the Concepts (Theories & Architectures) + Practicals
  • Assignments & Projects Scenarios for Real Projects
  • Practice questions for CCA 175 Certification
  • Process continual streams of data with Spark Streaming
  • Build, deploy, and run Spark scripts on Hadoop clusters
  • Transform structured data using SparkSQL and DataFrames

Requirements

  • Basic programming skills
  • Cloudera Quickstart VM or Your Own Hadoop Setup. You can use VM either with the course without any issues
  • A Laptop with Minimum RAM of 6GB to support VM (If using the VM provided in the course). You can do your own installation on local following the course
  • Having SQL skills would be advantageous

Description

The Course is for those who do not know even ABC of Big Data and tools, want to learn them and be in a comfortable situation to implement them in projects. The course is also for those, who have some knowledge on Big Data tools, but want to enhance them further and be comfortable working in Projects. Due to the extensive scenario implementation, the course is also suitable for people interested to write Big Data Certifications like CCA 175. The course contains Practice Test for CCA 175.


The course is being provided with fully functional Big Data labs on Cloudera & Windows VMs, you need not to buy cluster very often to practice the tools. Hence, the Course is ONE TIME INVESTMENT for secure future.


In the course, we will learn how to utilize Big Data tools like Hadoop, Flume, Kafka, Spark, Scala (the most valuable tech skills on the market today).


In this course I will show you how to -


1. Use Python and Spark to analyze Big Data.

2. Practice Test for writing CCA 175 Exam is available at the end of the course.

3. Extensive and Real time project scenarios with solutions as you will write in REAL PROJECTS

4. Use Sqoop to import data from Traditional Relational Databases to HDFS & Hive.

5. Use Flume and Kafka to process streaming data

6. Use Hive to view and store data & Partition the tables

7. Use Spark Streaming to fetch the streaming data from Kafka & Flume


Big Data is the most in demand skills right now, and with this course you can learn them quickly and easily! You can also learn the components in the basic setup in files like "hdfs-site.xml", "core-site.xml" etc  They are good to know if working for a project.


The course is focused on upskilling someone who do not know Big Data tools and target is to bring them up-to the mark to be able to work in Big Data projects seamlessly without issues.


This course comes with project scenarios and multiple datasets to work on with.


After completing this course you will feel comfortable putting Big Data, Python and Spark on your resume and also will be easily able to work and implement in projects!


Thanks and I will see you inside the course!

Who this course is for:

  • The course is designed to be used for all who want to learn and move to Big Data Technologies.
  • Those who want to get a real feel of project like scenarios along with learning the concepts
  • ONE Stop Shop for Required Big Data Tools with Theories, Concepts, Practicals, Practice Scenarios & Project Scenarios using Python Programming Langiage

Course content

10 sections • 80 lectures • 23h 18m total length

  • Preview04:54
  • Preview03:05

  • Preview35:06
  • Preview27:40
  • Hadoop Architecture - Part 3 - Understanding Job Tracker & Task Tracker
    29:26
  • Hadoop Refresh & File Systems
    21:45
  • Hadoop Terminologies & Configurations in XML Files
    34:29
  • Hadoop Commands on Windows or Windows VM - Part 1
    32:12
  • Hadoop Commands on Windows or Windows VM - Part 2
    15:16
  • Hadoop Commands on Cloudera Quick Start VM
    38:25

  • Sqoop Architecture
    11:20
  • Sqoop Eval on Windows/ Windows VM
    28:23
  • Sqoop Eval on Windows - Using -e & --query options
    03:02
  • Sqoop List Database and List Tables - Used for creating Generic Code
    11:21
  • Sqoop Import Command - Understanding and Analysing the Map-Reduce Functionality
    32:28
  • Sqoop Import - Append Mode of Execution
    15:23
  • Sqoop Import - Overwrite option & Different File Formats supported
    20:23
  • Sqoop Import - Using Where & Columns Options to filter the data import
    27:38
  • Sqoop Import - Executing User Specific Query with Where Clause
    12:16
  • Sqoop Import - Incremental Load Execution
    14:25
  • Sqoop Jobs - Create, List & Execute Sqoop Jobs
    04:31
  • Sqoop Import All Option to Import all tables from Mysql to HDFS
    24:42
  • Sqoop Import - Import from MySQL To Hive - Basic Import
    33:03
  • Sqoop Import - Import from MySQL To Hive - More Options
    19:54
  • Sqoop Import All - Import from MySQL to Hive using Import All
    08:40
  • Sqoop Import - from Mainframe - A basic know how
    02:48
  • Sqoop Export - Bring Data from HDFS to MySQL
    31:42
  • Sqoop Assignment for Practice
    04:45

  • Hive - Introduction & Features
    09:33
  • Hive - Architecture & Map-Reduce Execution
    23:13
  • Hive Tables
    08:42
  • Hive Partitioning & Bucketing - Concepts and Difference
    16:26
  • Hive Query Language - Overview and Syntax
    11:16
  • Hive QL - Practicals - Create Database & Tables & load sample data
    14:40
  • Hive QL - Practicals - Load Huge Data to Managed Tables
    22:06
  • Hive QL - Practicals - Creating and Loading Manged & External Tables
    22:29
  • Hive QL - Practicals - Partitioning in Hive
    36:16
  • Hive QL - Practicals - Bucketing in Hive
    19:31
  • Hive User Defined Functions
    05:21
  • Hive Performance Tuning Methods
    17:23

  • Flume - Concepts, Usage, Features & Advantages
    11:48
  • Flume Architecture
    22:43
  • Flume Data Flows , Contextual Routing & Other Concepts
    20:03
  • Basics of Flume Configurations
    26:11
  • Setup of Telnet in Windows
    03:57
  • Flume Practicals - Simple Flume Job using NetCat
    10:56
  • Flume Practicals - Flume Job using EXEC
    08:20
  • Flume Practicals - Flume Job using Sequence Generator
    03:59
  • Flume Practicals - Flume Job using Sequence Generator on HDFS
    09:28
  • Flume Practicals - Flume Job using Twitter on Windows
    43:34
  • Flume Practicals - Flume Job using Twitter on Cloudera
    12:25
  • Flume Practicals - Flume Job using Twitter on File Channel
    14:12
  • Flume Practicals - Flume Job using Twitter to Hive Sink
    07:56
  • Flume Multiplexing - One Source, One Channel & Two Sink - Logger and HDFS Sinks
    17:09
  • Industry Usage of Flume
    06:19

  • Kafka Concepts and Architecture 1
    36:06
  • Kafka Concepts and Architecture 2
    24:20
  • Kafka Concepts and Architecture 3
    51:46
  • Kafka Sample Execution on Cloudera
    17:49
  • Flume and Kafka Together
    18:55

  • Basics of coding environment for python
    09:44
  • Executing Print in CLI & Jupyter Notebook
    09:12
  • Creating Variables & Indented Code in Python
    15:58
  • Python Variables - Initialize, Assign & Reassign
    11:45
  • Python Math Functionalities
    08:28
  • Python Math Help
    00:44

  • Spark Architecture
    38:46
  • Spark Components, Lazy Executions, DAG, SparkSQL ,Performance Tuning etc
    55:25
  • Spark - Shuffles ,Coalesce, Repartition & Shared Variables
    14:57
  • Spark Streaming Concepts & DStream
    11:41
  • SPARK - RDD VS DATAFRAME VS DATASETS
    07:07
  • Spark - Catalyst Optimizer and Tungsten Engine
    12:12

  • Overall Big Data Project Structure
    17:05
  • Project Scenario - Bring Data from BI Database to Data Lake in Layer1
    11:17
  • Project Scenario 2
    10:41
  • Project Scenario 3 - Bring Files from Local File System to HDFS in Data lake
    13:56
  • Project Scenario 4 - Create Generic Jobs to read data from Data lake to layer 2
    03:33
  • Project Scenario 5 - Use SparkSQL to read data from layer 2 and write to Layer 3
    08:15
  • Project Scenario 6 - Merge MultipleFiles
    09:05

  • Practice Test in PDF for CCA 175 Exam
    04:37

Instructor

Anshul Roy
Machine Learning Engineer @ Adastra Germany
Anshul Roy
  • 3.8 Instructor Rating
  • 140 Reviews
  • 2,334 Students
  • 5 Courses

I am an experience Machine Learning Engineer having expertise in Big Data Technologies & BI Tools (IBM Datastage). I have experience on Implementing Machine Learning using Spark, Scala & Python. I am also an experienced Datastage, Python, Spark, Scala, R & Machine Learning trainer, enrolled with many consultancies. I also work as freelancer in my free time.

  • Udemy for Business
  • Teach on Udemy
  • Get the app
  • About us
  • Contact us
  • Careers
  • Blog
  • Help and Support
  • Affiliate
  • Terms
  • Privacy policy
  • Cookie settings
  • Sitemap
  • Featured courses
Udemy
© 2021 Udemy, Inc.