Udemy
    •  
    •  
    •  
    •  
    •  
    •  
    •  
    •  
Turn what you know into an opportunity and reach millions around the world.
Learn More
Your cart is empty.
Keep shopping
Learn Spark and Hadoop Overnight on GCP
Rating: 4.1 out of 5(22 ratings)
1,457 students

Learn Spark and Hadoop Overnight on GCP

Learn Hands-on by Building Your Own System on Spark and Hadoop
Created byCS PRO
Last updated 8/2018
English

What you'll learn

  • For E-Commerce Data Load and Operation Setting Up Hadoop and Spark
  • Up and Running With Spark on GCP

Course content

2 sections41 lectures3h 34m total length
  • Introduction to E-Commerce Data Load and Operation Setting Up Hadoop & Spark3:19

    Explore big data in e-commerce by setting up Hadoop and Spark on Google Cloud Platform. Understand the HDFS architecture, high availability, and cold storage for enterprise data.

  • Data Explosion and Reduction in Storage Cost4:24
  • When Data is Referred as Big Data5:50

    Discover how enterprises tackle big data with the three Vs: velocity, volume, and variety, using Hadoop to store and process vast data robustly for 10x returns.

  • Computer Science Behind Big Data Processing2:57

    Understand theoretical computer science behind big data processing, analyzing loop-based time complexity and how divide-and-conquer and memory distribution power Hadoop to handle large datasets.

  • Hot and Cold Data6:00

    Explore hot and cold data in enterprise architectures, storing hot data in ERP main memory for transactions and using Hadoop for cold data analytics with OLAP.

  • Hadoop Architecture5:37

    Understand Hadoop architecture with a master name node and data nodes, enabling scalable, low-cost storage, high availability, and cloud-based ERP cold data management via IaaS.

  • Hadoop Cluster Data Operation5:37

    Explore the Hadoop cluster architecture, including the name node and data nodes, and learn how files are written as 128 MB blocks with replication for fault tolerance.

  • High Availability and Replication for Enterprise Part 13:57

    Enable high availability for enterprise systems by replicating data across multiple nodes with a configurable replication factor, guided by the name node, heartbeat monitoring, and rack awareness.

  • High Availability and Replication for Enterprise Part 24:06

    Explore how Hadoop achieves high availability through replication and standby name node and secondary name node, with zookeeper coordination and fs image and log files on Google Cloud Platform.

  • GCP Dataproc and Modern Big Data Lifecycle .mp42:19

    Discover how Google Cloud Dataproc enables an end-to-end big data lifecycle and rapid Hadoop and Spark deployment in minutes.

  • Data Load into HDFS or Storage Bucket3:36

    Learn how to load data into HDFS or a storage bucket on GCP, compare unified storage with HDFS, and spin up HDFS for scalable map-reduce processing.

  • Configuring and Running Hadoop in GCP with Dataproc9:13
  • SSH Inside the Master Node and HDFS Files System6:16

    Learn to access the spark environment on GCP master node, explore Hadoop HDFS, start Spark and PySpark shells, use the web UI, and safely terminate clusters to save costs.

  • Summary - Part 11:20

Requirements

  • Basic Knowledge of Hadoop and Spark is required

Description

This is a comprehensive hands on course on Spark Hadoop

  • In this course we focused on Big Data and open source solutions around that. 

  • We require these tools for our E-commerce end of Project CORE (Create your Own Recommendation Engine) is one of its kind of project to learn technology End-to-End

  • We will explore Hadoop one of the prominent Big Data solution

  • We will look Why part and How part of it and its ecosystem, its Architecture and basic inner working and will also spin our first Hadoop under 2 min in Google Cloud

  • This particular course we are going to use in Project CORE which is comprehensive project on hands on technologies. In Project CORE you will learn more about Building you own system on Big Data, Spark, Machine Learning, SAPUI5, Angular4, D3JS, SAP® HANA®

  • With this Course you will get a brief understanding on Apache Spark™, Which is a fast and general engine for large-scale data processing. 

  • Spark is used in Project CORE to manage Big data with HDFS file system, We are storing 1.5 million records of books in spark and implementing collaborative filtering algorithm. 

  • Spark offers over 80 high-level operators that make it easy to build parallel apps. And you can use it interactively from the Scala, Python and R shells. 

  • Spark powers a stack of libraries including SQL and DataFrames, MLlib for machine learning, GraphX, and Spark Streaming. You can combine these libraries seamlessly in the same application. 

  • Runs Everywhere - Spark runs on Hadoop, Mesos, standalone, or in the cloud.   




Who this course is for:

  • Hadoop Learners
  • Hadoop Developers