Udemy
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
Development
Web Development Data Science Mobile Development Programming Languages Game Development Database Design & Development Software Testing Software Engineering Development Tools No-Code Development
Business
Entrepreneurship Communications Management Sales Business Strategy Operations Project Management Business Law Business Analytics & Intelligence Human Resources Industry E-Commerce Media Real Estate Other Business
Finance & Accounting
Accounting & Bookkeeping Compliance Cryptocurrency & Blockchain Economics Finance Finance Cert & Exam Prep Financial Modeling & Analysis Investing & Trading Money Management Tools Taxes Other Finance & Accounting
IT & Software
IT Certification Network & Security Hardware Operating Systems Other IT & Software
Office Productivity
Microsoft Apple Google SAP Oracle Other Office Productivity
Personal Development
Personal Transformation Personal Productivity Leadership Career Development Parenting & Relationships Happiness Esoteric Practices Religion & Spirituality Personal Brand Building Creativity Influence Self Esteem & Confidence Stress Management Memory & Study Skills Motivation Other Personal Development
Design
Web Design Graphic Design & Illustration Design Tools User Experience Design Game Design Design Thinking 3D & Animation Fashion Design Architectural Design Interior Design Other Design
Marketing
Digital Marketing Search Engine Optimization Social Media Marketing Branding Marketing Fundamentals Marketing Analytics & Automation Public Relations Advertising Video & Mobile Marketing Content Marketing Growth Hacking Affiliate Marketing Product Marketing Other Marketing
Lifestyle
Arts & Crafts Beauty & Makeup Esoteric Practices Food & Beverage Gaming Home Improvement Pet Care & Training Travel Other Lifestyle
Photography & Video
Digital Photography Photography Portrait Photography Photography Tools Commercial Photography Video Design Other Photography & Video
Health & Fitness
Fitness General Health Sports Nutrition Yoga Mental Health Dieting Self Defense Safety & First Aid Dance Meditation Other Health & Fitness
Music
Instruments Music Production Music Fundamentals Vocal Music Techniques Music Software Other Music
Teaching & Academics
Engineering Humanities Math Science Online Education Social Science Language Teacher Training Test Prep Other Teaching & Academics
AWS Certification Microsoft Certification AWS Certified Solutions Architect - Associate AWS Certified Cloud Practitioner CompTIA A+ Cisco CCNA Amazon AWS CompTIA Security+ AWS Certified Developer - Associate
Graphic Design Photoshop Adobe Illustrator Drawing Digital Painting InDesign Character Design Canva Figure Drawing
Life Coach Training Neuro-Linguistic Programming Personal Development Mindfulness Meditation Personal Transformation Life Purpose Emotional Intelligence Neuroscience
Web Development JavaScript React CSS Angular PHP WordPress Node.Js Python
Google Flutter Android Development iOS Development Swift React Native Dart Programming Language Mobile Development Kotlin SwiftUI
Digital Marketing Google Ads (Adwords) Social Media Marketing Google Ads (AdWords) Certification Marketing Strategy Internet Marketing YouTube Marketing Email Marketing Google Analytics
SQL Microsoft Power BI Tableau Business Analysis Business Intelligence MySQL Data Modeling Data Analysis Big Data
Business Fundamentals Entrepreneurship Fundamentals Business Strategy Online Business Business Plan Startup Blogging Freelancing Home Business
Unity Game Development Fundamentals Unreal Engine C# 3D Game Development C++ 2D Game Development Unreal Engine Blueprints Blender
30-Day Money-Back Guarantee
IT & Software Other IT & Software Big Data

Master Big Data - Apache Spark/Hadoop/Sqoop/Hive/Flume

In-depth course on Big Data - Apache Spark , Hadoop , Sqoop , Flume & Apache Hive, Big Data Cluster setup
Rating: 4.3 out of 54.3 (326 ratings)
3,970 students
Created by Navdeep Kaur
Last updated 1/2021
English
English [Auto]
30-Day Money-Back Guarantee

What you'll learn

  • Hadoop distributed File system and commands. Lifecycle of sqoop command. Sqoop import command to migrate data from Mysql to HDFS. Sqoop import command to migrate data from Mysql to Hive. Working with various file formats, compressions, file delimeter,where clause and queries while importing the data. Understand split-by and boundary queries. Use incremental mode to migrate the data from Mysql to HDFS. Using sqoop export, migrate data from HDFS to Mysql. Using sqoop export, migrate data from Hive to Mysql. Understand Flume Architecture. Using flume, Ingest data from Twitter and save to HDFS. Using flume, Ingest data from netcat and save to HDFS. Using flume, Ingest data from exec and show on console. Flume Interceptors.
Curated for the Udemy for Business collection

Requirements

  • No

Description

In this course, you will start by learning what is hadoop distributed file system and most common hadoop commands required to work with Hadoop File system.


Then you will be introduced to Sqoop Import

  • Understand lifecycle of sqoop command.

  • Use sqoop import command to migrate data from Mysql to HDFS.

  • Use sqoop import command to migrate data from Mysql to Hive.

  • Use various file formats, compressions, file delimeter,where clause and queries while importing the data.

  • Understand split-by and boundary queries.

  • Use incremental mode to migrate the data from Mysql to HDFS.


Further, you will learn Sqoop Export to migrate data.

  • What is sqoop export

  • Using sqoop export, migrate data from HDFS to Mysql.

  • Using sqoop export, migrate data from Hive to Mysql.



Further, you will learn about Apache Flume

  • Understand Flume Architecture.

  • Using flume, Ingest data from Twitter and save to HDFS.

  • Using flume, Ingest data from netcat and save to HDFS.

  • Using flume, Ingest data from exec and show on console.

  • Describe flume interceptors and see examples of using interceptors.

  • Flume multiple agents

  • Flume Consolidation.


In the next section, we will learn about Apache Hive

  • Hive Intro

  • External & Managed Tables

  • Working with Different Files - Parquet,Avro

  • Compressions

  • Hive Analysis

  • Hive String Functions

  • Hive Date Functions

  • Partitioning

  • Bucketing


Finally You will learn about Apache Spark

  • Spark Intro

  • Cluster Overview

  • RDD

  • DAG/Stages/Tasks

  • Actions & Transformations

  • Transformation & Action Examples

  • Spark Data frames

  • Spark Data frames - working with diff File Formats & Compression

  • Dataframes API's

  • Spark SQL

  • Dataframe Examples

  • Spark with Cassandra Integration


Who this course is for:

  • Who want to learn big data in detail

Course content

10 sections • 87 lectures • 8h 22m total length

  • Preview01:26
  • Preview05:24
  • Preview09:16
  • Yarn Cluster Overview
    07:41
  • Cloudera vm setup
    00:10
  • Cluster Setup on Google Cloud
    21:20
  • GCP Cluster Fixes
    00:13
  • Environment Update
    00:42

  • Sqoop Introduction
    Preview15:48
  • Managing Target Directories
    07:26
  • Preview08:24
  • Working with Avro File Format
    11:35
  • Working with Different Compressions
    10:08
  • Conditional Imports
    04:26
  • Split-by and Boundary Queries
    08:27
  • Field delimeters
    03:18
  • Incremental Appends
    11:38
  • Sqoop-Hive Cluster Fix
    00:05
  • Sqoop Hive Import
    03:31
  • Sqoop List Tables/Database
    04:13
  • Sqoop Assignment1
    1 question
  • Sqoop Assignment2
    1 question
  • Sqoop Import Practice1
    04:57
  • Sqoop Import Practice2
    03:32

  • Export from Hdfs to Mysql
    03:39
  • Export from Hive to Mysql
    02:30
  • Export Avro Compressed to Mysql
    07:30
  • Bonus Lecture: Sqoop with Airflow
    02:57

  • Flume Introduction & Architecture
    10:07
  • Exec Source and Logger Sink
    03:41
  • Moving data from Twitter to HDFS
    09:25
  • Moving data from NetCat to HDFS
    04:39
  • Flume Interceptors
    01:56
  • Flume Interceptor Example
    04:53
  • Flume Multi-Agent Flow
    06:49
  • Flume Consolidation
    06:11

  • Hive Introduction
    03:41
  • Hive Database
    08:29
  • Hive Managed Tables
    06:23
  • Hive External Tables
    02:26
  • Hive Inserts
    05:30
  • Hive Analytics
    04:21
  • Working with Parquet
    03:29
  • Compressing Parquet
    04:27
  • Working with Fixed File Format
    03:04
  • Alter Command
    06:12
  • Hive String Functions
    06:21
  • Hive Date Functions
    05:39
  • Hive Partitioning
    07:16
  • Hive Bucketing
    03:44

  • Spark Intro
    03:46
  • Resilient Distributed Datasets
    02:52
  • Cluster Overview
    06:51
  • DAG Overview
    10:06
  • Spark on GCS Cluster
    01:48

  • Map/FlatMap Transformation
    04:28
  • Filter/Intersection
    04:00
  • Union/Distinct Transformation
    02:23
  • GroupByKey/ Group people based on Birthday months
    05:53
  • ReduceByKey / Total Number of students in each Subject
    06:44
  • SortByKey / Sort students based on their rollno
    06:03
  • MapPartition / MapPartitionWithIndex
    06:20
  • Change number of Partitions
    03:34
  • Join / join email address based on customer name
    03:06
  • Spark Actions
    06:05

  • Scala Tuples
    03:05
  • Filter Error Logs
    10:22
  • Frequency of word in Text File
    08:35
  • Population of each city
    03:53
  • Orders placed by Customers
    09:20
  • average rating of movie
    07:04

  • Dataframe Intro
    02:16
  • Dafaframe from Json Files
    08:42
  • Dataframe from Parquet Files
    07:26
  • Dataframe from CSV Files
    05:14
  • Dataframe from Avro File
    07:13
  • Working with XML
    03:22
  • Working with Columns
    05:23
  • Working with String
    04:05
  • Working with Dates
    03:47
  • Dataframe Filter API
    02:50
  • DataFrame API Part1
    04:51
  • DataFrame API Part2
    06:25
  • Spark SQL
    01:41
  • Working with Hive Tables in Spark
    02:34

  • Creating Spark RDD from Cassandra Table
    09:13
  • Processing Cassandra data in Spark
    08:18
  • Cassandra Rows to Case Class
    02:33
  • Saving Spark RDD to Cassandra
    02:58

Instructor

Navdeep Kaur
TechnoAvengers.com (Founder)
Navdeep Kaur
  • 4.3 Instructor Rating
  • 1,824 Reviews
  • 22,198 Students
  • 8 Courses

I am a big data architect with 11 years of industry experience in different technologies and domains. I have keen interest in providing training in new technologies. I have received CCA175 Hadoop and Spark developer certification  and  AWS solution architect certification. I love guiding people and helping them achieve new goals.

  • Udemy for Business
  • Teach on Udemy
  • Get the app
  • About us
  • Contact us
  • Careers
  • Blog
  • Help and Support
  • Affiliate
  • Impressum Kontakt
  • Terms
  • Privacy policy
  • Cookie settings
  • Sitemap
  • Featured courses
Udemy
© 2021 Udemy, Inc.