Delta Lake with Apache Spark using Scala
What you'll learn
- You will be able to learn Delta Lake with Apache Spark in few hours
- Basics to Advance Level of Knowledge about Delta Lake
- Hands on practice with Delta Lake
- You will Learn Delta Lake with Apache Spark using Scala on DataBricks Platform
- Learn how to leverage the power of Delta Lake with a Spark Environment!
- Learn about the DataBricks Platform!
Course content
- Preview03:21
- Preview01:30
- 01:09Introduction to Data Lake
- 04:57Key Features of Delta Lake
- 04:04Introduction to Spark
- 01:51Free Account creation in Databricks
- 02:15Provisioning a Spark Cluster
- 07:29Basics about notebooks
- 04:47Dataframes
- 06:38(Hands On) Create a table
- 14:12(Hands On) Write a table
- Preview06:52
- 02:50Schema validation
- 03:01(Hands On) Update table schema
- 01:53Table Metadata
- 01:44Delete from a table
- 02:11Update a Table
- 01:59Vacuum
- 01:34History
- 01:08Concurrency Control
- 02:33Optimistic concurrency control
- 05:23Migrate Workloads to Delta Lake
- 01:13Optimize Performance with File Management
- 01:47FAQ (Interview Question on Optimization) 1
- 01:50FAQ (Interview Question on Optimization) 2
- 00:51FAQ (Interview Question on Optimization) 3
- 02:45Auto Optimize
- 00:50FAQ (Interview Question on Auto Optimize) 4
- 01:06FAQ (Interview Question on Auto Optimize) 5
- 01:11Optimize Performance with Caching
- 03:26Delta and Apache Spark caching
- 01:37Cache a subset of the data
- 01:06Isolation Levels
- 02:56Best Practices
- 01:06FAQ (Interview Question) 6
- 00:37FAQ (Interview Question) 7
- 00:42FAQ (Interview Question) 8
- 00:20FAQ (Interview Question) 9
- 00:26FAQ (Interview Question) 10
- 00:28FAQ (Interview Question) 11
- 00:27FAQ (Interview Question) 12
- 00:43FAQ (Interview Question) 13
- 00:55FAQ (Interview Question) 14
- 01:39FAQ (Interview Question) 15
- 00:31FAQ (Interview Question) 16
- 00:32FAQ (Interview Question) 17
- 01:00FAQ (Interview Question) 18
- 01:25FAQ (Interview Question) 19
- 00:20Important Lecture
- 00:52Bonus Lecture
Requirements
- Apache Spark and Scala and SQL basic knowledge is necessary for this course
Description
You will Learn Delta Lake with Apache Spark using Scala on DataBricks Platform
Learn the latest Big Data Technology - Spark! And learn to use it with one of the most popular programming languages, Scala!
One of the most valuable technology skills is the ability to analyze huge data sets, and this course is specifically designed to bring you up to speed on one of the best technologies for this task, Apache Spark! The top technology companies like Google, Facebook, Netflix, Airbnb, Amazon, NASA, and more are all using Spark to solve their big data problems!
Spark can perform up to 100x faster than Hadoop MapReduce, which has caused an explosion in demand for this skill! Because the Spark 3.0 DataFrame framework is so new, you now have the ability to quickly become one of the most knowledgeable people in the job market!
Delta Lake is an open-source storage layer that brings reliability to data lakes. Delta Lake provides ACID transactions, scalable metadata handling, and unifies streaming and batch data processing. Delta Lake runs on top of your existing data lake and is fully compatible with Apache Spark APIs.
Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming.
Topics Included in the Courses
Introduction to Delta Lake
Introduction to Data Lake
Key Features of Delta Lake
Introduction to Spark
Free Account creation in Databricks
Provisioning a Spark Cluster
Basics about notebooks
Dataframes
Create a table
Write a table
Read a table
Schema validation
Update table schema
Table Metadata
Delete from a table
Update a Table
Vacuum
History
Concurrency Control
Optimistic concurrency control
Migrate Workloads to Delta Lake
Optimize Performance with File Management
Auto Optimize
Optimize Performance with Caching
Delta and Apache Spark caching
Cache a subset of the data
Isolation Levels
Best Practices
Frequently Asked Question in Interview
About Databricks:
Databricks lets you start writing Spark code instantly so you can focus on your data problems.
Who this course is for:
- Beginner Apache Spark Developer, Bigdata Engineers or Developers, Software Developer, Machine Learning Engineer, Data Scientist, Data Analyst, Analyst
Instructor
I am Solution Architect with 12+ year’s of experience in Banking, Telecommunication and Financial Services industry across a diverse range of roles in Credit Card, Payments, Data Warehouse and Data Center programmes
My role as Bigdata and Cloud Architect to work as part of Bigdata team to provide Software Solution.
Responsibilities includes,
- Support all Hadoop related issues
- Benchmark existing systems, Analyse existing system challenges/bottlenecks and Propose right solutions to eliminate them based on various Big Data technologies
- Analyse and Define pros and cons of various technologies and platforms
- Define use cases, solutions and recommendations
- Define Big Data strategy
- Perform detailed analysis of business problems and technical environments
- Define pragmatic Big Data solution based on customer requirements analysis
- Define pragmatic Big Data Cluster recommendations
- Educate customers on various Big Data technologies to help them understand pros and cons of Big Data
- Data Governance
- Build Tools to improve developer productivity and implement standard practices
I am sure the knowledge in these courses can give you extra power to win in life.
All the best!!