Teach on Udemy

Turn what you know into an opportunity and reach millions around the world.

Learn More

Your cart is empty.

Keep shopping

System Design for Big Data Pipelines

Name: System Design for Big Data Pipelines
Rating: 4.2 (61 reviews)

Analyze, Design and Build scalable, resilient and cost-effective Big Data pipelines with a methodical process

Created byV2 Maestros, LLC

Last updated 4/2023

English

What you'll learn

Learn about the building blocks of a big data pipeline, their functions and challenges
Adapt an end-to-end methodical approach to designing a big data pipeline
Explore techniques to ensure overall scaling of a big data pipeline
Study design patterns for building blocks, their advantages, shortcomings, applications and available technologies
Focus additionally on Infrastructure, Operations and Security for Big Data deployments
Exercise the learnings in the course with a Batch and Realtime use case study

Course content

15 sections • 90 lectures • 6h 32m total length

Need for Quality Pipeline Design3:46
Discuss the need for quality pipeline design for big data pipelines. Explore the key activities in building such a design
Course Coverage and Pre-requisites4:16
Familiarize with the covered topics, out-of-scope topics and pre-requisites for the course.
Cloud Serverless Technologies1:50
Discuss how serverless technologies from cloud providers relate to the contents of this course.

The Big Data Pipeline Network3:02
Describe the overall pipeline network and the building blocks in the network
Data Acquisition Blocks3:21
Discuss the features and challenges for the data acquisition block in a big data pipeline
Data Transport Blocks3:01
Discuss the features and challenges for the data transport block in a big data pipeline
Data Processing Blocks2:49
Discuss the features and challenges for the data processing block in a big data pipeline
Data Storage Blocks2:28
Discuss the features and challenges for the data storage block in a big data pipeline
Data Serving Blocks2:26
Discuss the features and challenges for the data serving block in a big data pipeline
Data Pipeline Infrastructure2:38
Discuss the features and challenges for the pipeline infrastructure in a big data pipeline
Data Pipeline Operations3:16
Discuss the features and challenges for the operations block in a big data pipeline

System Design Process Overview3:25
Study the overall System Design Process to be followed for Big Data Pipeline Design
Analyze Functional Requirements5:40
Explore the functional requirements provided for the use case and look for key indicators that require special attention for big data processing.
Analyze Pipeline Input5:19
Analyze the input data to the big data pipeline to understand various characteristics like format, protocol and availability schedules
Analyze Non-functional Requirements3:44
Analyze the non-functional requirements for the big data pipelines, especially those that relate to big data like scalability and fault tolerance
Draw a Pipeline Flowchart2:53
Create a pipeline flowchart that captures the steps and workflow needed to convert inputs to outputs
Create a Skeleton Design4:01
Add Big Data specific patterns and techniques to the flowchart and create a skeleton design
Analyze Scaling5:20
Analyze scaling of the skeleton architecture to ensure horizontal scalability and detect bottlenecks.
Select Technologies5:28
Choose the right technologies for the building blocks used in the solution
Design Infrastructure and Operations3:35
Design infrastructure, Security and Serviceability for the big data pipeline
Develop a Test Strategy3:09
Create a test strategy for testing the big data pipeline that covers regression, scaling and automation

Batch vs Realtime Pipelines8:35
Compare the characteristics of Batch Pipelines and Realtime Pipelines and analyze suitability for use cases
Distributed Architectures4:54
Distributed Architectures help ensure horizontal scalability for handling big data traffic. Discuss the key features and levers for distributed architectures
Microservices based Architectures6:40
The principles of Microservices architectures still apply when designing big data pipelines. Explore key principles and how they apply to big data pipelines.
Batch Pipelines - Best Practices5:32
Discuss key best practices when designing batch big data pipelines
Realtime Pipelines - Best Practices5:40
Discuss key design practices when designing realtime big data pipelines
Performance Benchmarking for Big Data Pipelines5:22
Explore the options for benchmarking performance for a big data pipeline

File Transfer Pattern4:43
Analyze the File Transfer Pattern for Acquisition. Discuss its advantages, shortcomings, use cases and availability technologies.
Extraction Client Pattern4:17
Analyze the Extraction Client Pattern for Acquisition. Discuss its advantages, shortcomings, use cases and availability technologies.
Ingestion API Pattern4:47
Analyze the Ingestion API Pattern for Acquisition. Discuss its advantages, shortcomings, use cases and availability technologies.
Pub Sub Acquisition Pattern4:00
Analyze the Pub Sub Pattern for Acquisition. Discuss its advantages, shortcomings, use cases and availability technologies.
Data Acquisition Design Practices5:06
Explore Design Best Practices for Big Data Acquisition

Extract Load Pattern4:09
Analyze the Extract Load Pattern for Data Transport. Discuss its advantages, shortcomings, use cases and availability technologies.
Request Response Pattern5:43
Analyze the Request Response Pattern for Data Transport. Discuss its advantages, shortcomings, use cases and availability technologies.
Event Streaming Pattern6:19
Analyze the Event Streaming Pattern for Data Transport. Discuss its advantages, shortcomings, use cases and availability technologies.
Data Transport Design Practices4:10
Explore some Best Practices for Big Data Transport Design

Data Processing Patterns4:43
Explore several Data Processing Patterns that can be used for Big Data Processing Design.
Distributed Processing with Big Data4:51
Study how Big Data Processing Engines work behind the scenes to process data in a horizontally scalable manner
Batch Processing Design Practices - Part 18:56
Discuss best practices for designing batch processing jobs for big data processing
Batch Processing Design Practices - Part 26:32
Discuss best practices for designing batch processing jobs for big data processing
Stream Processing Design Practices6:40
Discuss best practices for designing stream processing jobs for big data processing
Batch vs Realtime Processing3:47
Study the differences between batch and realtime when it comes to processing jobs. Explore how design changes based on this criteria
Input and Output Considerations for Processing5:24
Discuss the importance and techniques for reading inputs and writing outputs in a scalable manner inside a processing job
Processing Engine Technologies5:11
Compare popular processing engine technologies available in the market today.

Distributed File System Pattern5:19
Analyze the Distributed File System Pattern for Data Storage. Discuss its advantages, shortcomings, use cases and availability technologies.
Relational Database Pattern4:59
Analyze the Relational Database Pattern for Data Storage. Discuss its advantages, shortcomings, use cases and availability technologies.
Document Database Pattern4:41
Analyze the Document Database Pattern for Data Storage. Discuss its advantages, shortcomings, use cases and availability technologies.
Columnar Database Pattern3:29
Analyze the Columnar Database Pattern for Data Storage. Discuss its advantages, shortcomings, use cases and availability technologies.
Graph Database Pattern3:35
Analyze the Graph Database Pattern for Data Storage. Discuss its advantages, shortcomings, use cases and availability technologies.
Distributed Cache Pattern4:20
Analyze the Distributed Cache Pattern for Data Storage. Discuss its advantages, shortcomings, use cases and availability technologies.
Data Storage Design Practices - 15:12
Discuss Data Storage Best Practices when building big data pipelines
Data Storage Design Practices - 23:54
Discuss Data Storage Best Practices when building big data pipelines

Query Interface Pattern4:52
Analyze the Query Interface Pattern for Data Serving. Discuss its advantages, shortcomings, use cases and availability technologies.
Serving API Pattern4:37
Analyze the Serving API Pattern for Data Serving. Discuss its advantages, shortcomings, use cases and availability technologies.
Push Client Pattern3:36
Analyze the Push Client Pattern for Data Serving. Discuss its advantages, shortcomings, use cases and availability technologies.
Publish Subscribe Pattern3:04
Analyze the Publish Subscribe Pattern for Data Serving. Discuss its advantages, shortcomings, use cases and availability technologies.
Data Serving Design Practices5:00
Discuss Best Practices for Data Serving when building big data pipelines

Infrastructure Technologies6:00
Discuss the infrastructure technologies available for deploying and operating big data technologies
Microservices Deployments2:03
Use the microservices deployment patterns for building and deploying building blocks in a big data pipeline
Processing Jobs Deployments6:43
Discuss the deployment options for deploying processing jobs in a big data pipeline. Compare their benefits and use cases
Databases and Queues Deployments5:11
Discuss the deployment options for deploying databases and queues in a big data pipeline. Compare their benefits and use cases
Geographical Distribution6:03
Review the use cases where geographically distributed pipelines are needed. Discuss some best practices for the same

Requirements

Big Data Technology Concepts
Familiarity with Big Data Technologies like Apache Spark, Apache Kafka and NoSQL
Development / Deployment Experience with Big Data Technologies and Pipelines
Software Design and Development Experience including Cloud & Microservices

Description

Big data technologies have been growing exponentially over the past few years and have penetrated into every domain and industry in software development. It has become a core skill for a software engineer. Robust and effective big data pipelines are needed to support the growing volume of data and applications in the big data world. These pipelines have become business critical and help increase revenues and reduce cost.

Do quality big data pipelines happen by magic? High quality designs that are scalable, reliable and cost effective are needed to build and maintain these pipelines.

How do you build an end-to-end big data pipeline that leverages big data technologies and practices effectively to solve business problems? How do you integrate them in a scalable and reliable manner? How do you deploy, secure and operate them? How do you look at the overall forest and not just the individual trees? This course focuses on this skill gap.

What are the topics covered in this course?

We start off by discussing the building blocks of big data pipelines, their functions and challenges.

We introduce a structured design process for building big data pipelines.

We then discuss individual building blocks, focusing on the design patterns available, their advantages, shortcomings, use cases and available technologies.

We recommend several best practices across the course.

We finally implement two use cases for illustration on how to apply the learnings in the course to a real world problem. One is a batch use case and another is a real time use case.

Who this course is for:

Big Data Pipeline Designers & Architects
Big Data Developers looking to move into Design/Architecture roles
Software Architects looking to gain Big Data Experience

System Design for Big Data Pipelines

What you'll learn

Explore related topics

Course content

Introduction & Expectations3 lectures • 10min

Building Blocks for Big Data Pipelines8 lectures • 23min

System Design Process10 lectures • 43min

Scalable Pipelines - Design Principles6 lectures • 37min

Data Acquisition Design5 lectures • 23min

Data Transport Design4 lectures • 20min

Data Processing & Transformation Design8 lectures • 46min

Storage Design8 lectures • 35min

Serving Design5 lectures • 21min

Infrastructure and Deployments5 lectures • 26min

Requirements

Description

Who this course is for: