Teach on Udemy

Turn what you know into an opportunity and reach millions around the world.

Learn More

Your cart is empty.

Keep shopping

PostgreSQL Replication, High Availability HA and Scalability

Name: PostgreSQL Replication, High Availability HA and Scalability
Rating: 4.3 (448 reviews)

Solutions for Scaling Postgres with Master-Slave Replication, PgBouncer, PgPool II, HAProxy, Partitioning, Sharding

Created byLucian Oprea

Last updated 1/2025

English

What you'll learn

Assess your scaling needs
How to scale reads using Replication and Load-Balancing
Which is the best Replication solution for a certain use case
How to manage database connections with PgBouncer connection pooler
How to make use of multiple PostgreSQL instances in the cloud (Google Cloud)
How to achieve High-Availability
How to perform Automatic Failover using PgPool II
How to scale writes using Partitioning and Sharding

Course content

10 sections • 81 lectures • 3h 8m total length

Why Scale PostgreSQL?1:26
Quiz Why Scale PostgreSQL?
Vertical Scaling1:13
Quiz Vertical Scaling
Horizontal Scaling2:41
Learn how horizontal scaling expands beyond fixed cpu limits by exploring replication options, distribution mechanisms, connection pooling, queuing, partitioning, sharding, and multi-master approaches for high availability and scalable performance.
Horizontal Scaling
CAP Theorem Explained3:48
PostgreSQL vs. NoSQL1:59
Explore Cassandra's availability and partition tolerance, its consistency trade-offs, and PostgreSQL scaling options within the context of relational databases and distributed systems.
Quiz PostgreSQL vs. NoSQL
Use case: Consistent and Available System1:11
Explore how transactions ensure consistency in banking operations, updating both balances or none, and prioritize data consistency over partial availability in high availability systems.
Quiz Use case: Consistent and Available System
Use case: Available and Partition-tolerant System1:01
Explore how availability and partition tolerance shape a system, balancing data consistency with user experience as likes and pictures are tracked, given scale and the impossibility of meeting all requirements.
Quiz Use case: Available and Partition-tolerant System
Read Versus Write Bound Workload2:45
Quiz Read Versus Write Bound Workload
How statistics will answer to all questions?1:15
Quiz How statistics will answer to all questions?
Enable Statistics4:40
Quiz Enable Statistics
Replication1:27
Explore PostgreSQL replication to boost performance and high availability by copying data from a master server to distant bases, using built-in options or middleware and relying on logs.
Quiz Replication
Load Balancing1:29
Learn how to distribute read traffic across PostgreSQL replicas with a load balancer, creating a replica read pool and using tools like EDU Proxy and DG poll tool.
Connection Pooling1:20
Quiz Connection Pooling
Queuing0:53
Use queuing to smooth write traffic by delaying real-time persistence of rights within an acceptable timeframe. If traffic remains high, split the dataset to scale in parallel.
Partitioning0:37
Partitioning splits a table into multiple tables, enabling queries to scan smaller tables and indexes without the application noticing. It keeps partitions in the same database, avoiding shard complexities.
Sharding0:48
Quiz Sharding
Multi-master1:45
Explore multi master replication where multiple nodes hold identical data and allow writes via bidirectional replication since PostgreSQL 9.4. Understand challenges like conflicting updates, sequences, upgrades, failover testing.

What is Streaming Replication?2:12
Asynchronous vs. Synchronous Replication2:32
Quiz Asynchronous vs. Synchronous Replication
Hands-on - Initialise Primary Database1:28
Learn to configure PostgreSQL for streaming and asynchronous replication by initializing a primary database cluster with initdb, creating data directories, system tables, and the default Postgres database.
Hands-on - Initialise Primary Database
Configuring the Primary for Replication3:46
Configure the primary for replication by editing postgresql.conf to enable remote access with listen_addresses, create a replication user with replication flag, update pg_hba.conf, and restart.
Quiz Configuring the Primary for Replication
Configuring the Replica Instance3:00
Quiz Configuring the Replica Instance
Testing Replication Setup2:52

What is Logical Replication in Postgres?5:53
Understand PostgreSQL logical replication, with publisher and subscriber roles, enabling selective table replication, streaming decoded statements, and efficient migrations with less data transfer than physical replication.
Quiz What is Logical Replication in Postgres?
Setting-up Postgres Servers for Logical Replication4:44
Set up two PostgreSQL database clusters on the same machine to demonstrate logical replication. Configure the publisher and subscriber ports and enable logical level, then start both instances to test.
Quiz Setting-up Postgres Servers for Logical Replication
Selective Copy of the Data3:15
Perform a selective copy of data by creating a test database and table, using pg_dump to transfer the schema to a subscriber via port 5434.
Quiz Selective Copy of the Data
Create the Publication0:46
Create a publication for table one, or specify multiple tables or all tables, and pair it with a subscription to enable replication.
Create the Subscription3:01
Create a subscription to pull changes from publications, enable replication slots and initial snapshot, and verify replicated inserts, updates, deletes, and truncates on the subscriber.
Quiz Create the Subscription
Limitations of Logical Replication2:22
Understand the limitations of logical replication in PostgreSQL: schema and sequences are not replicated, changes require manual syncing with PGD, and only regular and partition tables are supported.
Quiz Limitations of Logical Replication
Monitoring Logical Replication3:10
Monitor PostgreSQL replication with the PDA starter application table to view key metrics, including lag between master and standby in seconds or milliseconds, and set alerts.
Best use-cases for Logical Replication1:48
Use logical replication for flexibility; it doesn't require schemas, supports major versions, Windows to Linux replication, multiple subscriptions, selective operations, and incremental changes with triggers, boosting performance on slow networks.

Introduction0:52
Handle hundreds to thousands of concurrent connections in large applications by using a connection pooler such as PGE Bouncer to boost PostgreSQL performance beyond 350 transactions per second.
Quiz Introduction
Fundamental concepts of connection pooling2:23
PGE bouncer acts as a proxy between the application and PostgreSQL, maintaining an array of open connections to quickly serve requests with a minimal memory footprint.
Quiz Fundamental concepts of connection pooling
Building a PgBouncer Setup1:59
Build a PgBouncer setup to front a PostgreSQL test database, run a stress test, and note its 1-to-1 connections and the need for external load balancing for high availability.
Quiz Building a PgBouncer Setup
Installing and Configure PgBouncer0:37
Install PgBouncer using your OS package manager—yum on Red Hat or CentOS, apt-get on Ubuntu or Debian, or brew on macOS; or download and build from sources.
Creating a basic configuration file for PgBouncer3:15
Install pgbouncer and copy the default config from the OS directory. Edit the file to configure backend servers, alias test_tb, authentication, and a dummy superuser.
Quiz Installing and Configure PgBouncer
Connecting to PgBouncer1:13
Launch the PGA bouncer using the configuration file to start the system. Connect to the progress database on port 6432 using alias B and user progress, with log entries.
Advanced Settings for Performance2:55
Boost performance by tuning the PGE bouncer: set mean_proposal_size, keep the default pool size per user per database at 20, cap the queue with max_client_connections, and keep nodes nearby.
Quiz Advanced Settings for Performance
Pool Modes1:51
Explore pool modes for PostgreSQL connections, including session, transaction, and statement modes, and learn how each affects connection lifecycle, concurrency, and application behavior.
A simple benchmark3:57
The benchmark shows that PgBouncer dramatically increases throughput and lowers connection time for many short-lived connections, from 3.21 ms to 0.23 ms, using 20 clients and 1000 transactions.

Introduction0:26
Implement a load balancing solution for PostgreSQL replicas using Google Cloud, enabling horizontal scaling and improved high availability.
Quiz Introduction
Key Components0:37
Key Characteristics of the Architecture0:57
Balance load for grid connections to improve read availability and move toward high availability. Route read requests via the energy proxy to replicas, while writes connect to the primary.
Creating PostgreSQL Instances on Google Cloud4:36
Create a primary PostgreSQL instance with two read replicas on Google Cloud to boost read capacity, using private IPs in the same region and a least-privilege test user.
Quiz Creating PostgreSQL Instances on Google Cloud
Creating a GCE for HAProxy3:59
Create a Google Cloud engine instance to run HAProxy, install the load balancer, and configure the Cloud SDK and APIs to enable PostgreSQL high availability.
Configure HAProxy for Load-Balancing6:15
Quiz Configure HAProxy for Load-Balancing
Testing Load-Balancing3:49

Introduction0:57
Quiz Introduction
Which Tables Need Partitioning?2:06
Quiz Which Tables Need Partitioning?
How should the Tables be Partitioned?1:47
Partitioning can dramatically boost performance when done right. Identify partition keys from workflows and joins, and apply one of four methods to minimize scanned data.
Quiz How should the Tables be Partitioned?
Declarative vs. Inheritance Partitioning0:44
Creating a Partitioned Table4:22
Demonstrate range partitioning in PostgreSQL by splitting a customers table into age-based partitions, migrating with minimal downtime, and identifying each row’s partition via a system column.
Quiz Creating a Partitioned Table
Partitioning Methods1:45
Explore partitioning methods to optimize PostgreSQL performance, including range and list partitions, hash partitioning, and multilevel partitioning, with examples using sales dates, departments, regions, and product keys.

Introduction3:03
Explore strategies to handle growing write traffic by sharding data into logical partitions distributed across physical nodes, and employ functional partitioning to scale PostgreSQL databases efficiently.
Quiz Introduction
Pain Points of Sharding4:15
Quiz Pain Points of Sharding
How to Partition Data in PostgreSQL3:23
Identify a partition key from the data model, such as user ID, and implement horizontal partitioning with data mapping table to route queries and maintain unique primary keys across shards.
Quiz How to Partition Data in PostgreSQL
Second Level Sharding2:05
Explore second level sharding with multiple partition keys, such as user ID and article ID, to view data efficiently from different angles and cross the boundaries between shards for comments.
Quiz Second Level Sharding
Querying Across Shards1:38

Why High Availability?0:38
Organizations must plan for a wide range of failure scenarios to achieve high availability and business continuity, ensuring the database remains available even when some parts fail.
Steps to achieve High Availability1:56
Achieve high availability by replicating data to a standby replica, exploring log shipping, streaming, and logical replication at a high level, and choosing manual or automatic failover with failback considerations.
Quiz Steps to achieve High Availability
Essential Questions to set-up High Availability5:20
Log-Shipping Replication1:23
PostgreSQL uses warm standby or log shipping to archive changes and recover on standby servers. Hot standby reduces delay, while streaming and logical replication now cover most replication use cases.
Quiz Log-Shipping Replication
Streaming Replication and Logical Replication2:30
Quiz Streaming Replication and Logical Replication
Cascading Replication1:16
Explore cascading replication in PostgreSQL, streaming changes from the primary to standbys. Compare topologies: primary-fed standbys, or one-secondary-per-standby, or a secondary feeding multiple targets for faster failover and balanced load.
Synchronous vs. Asynchronous Replication2:04
Quiz Synchronous vs. Asynchronous Replication
Automatic Failover and Always-on Strategy1:20
Simple HA Solution Example0:51
Explore a high-availability example with one primary and one standby, where client connects to the primary for reads and writes; failover is manual via a trigger file or Paramount Command.
Better HA Solution Example1:41

Introduction0:37
Pgpool-II Features3:18
Quiz Pgpool-II Features
Configure Pgpool-II with Streaming Replication1:07
Implement high availability by using PostgreSQL built-in streaming replication to synchronize data, then use BGP for load balancing and automatic failover, with pgpool routing writes to the primary and reads to the replica.
Quiz Configure Pgpool-II with Streaming Replication
Setting up Streaming Replication5:52
Learn to configure PostgreSQL streaming replication to enable read load balancing, write separation, and failover by setting up a primary, creating a replica using base backup, and configuring replication access.
Configuring Pgpool-II for Load Balancing6:41
Configure Pgpool-II to balance load across a primary and standby in a streaming replication setup, directing inserts and deletes to the primary and selects to the standby.
Quiz Configuring Pgpool-II for Load Balancing
Testing load-balancing & read/write separation3:24
Configure Pgpool for PostgreSQL High-Availability0:57
Configuring PostgreSQL Primary Server2:33
Quiz Configuring PostgreSQL Primary Server
Configuring Pgpool-II Server2:24
Configuring PostgreSQL Replica Server1:23
Quiz Configuring PostgreSQL Replica Server
Testing The Failover2:11
Restoring failed nodes1:56
Quiz Restoring failed nodes

Requirements

You need access to a Windows/Mac/Linux PC with 10GB of free disk space
Basic familiarity with database objects such as tables and indexes is expected
Some familiarity with Linux will be helpful

Description

PostgreSQL is one of the most powerful and easy-to-use database management systems. It has strong support from the community and is being actively developed with a new release every year.

PostgreSQL supports the most advanced features included in SQL standards. It also provides NoSQL capabilities and very rich data types and extensions. All of this makes PostgreSQL a very attractive solution in software systems.

In this course, we discussed the problem of building scalable solutions based on PostgreSQL utilizing the resources of several servers. There is a natural limitation for such systems—basically, there is always a compromise between performance, reliability, and consistency. It's possible to improve one aspect, but others will suffer. In this course, we'll see how to find the best match for our use-cases so that we know eactly which aspects need scaling, and avoid the common trade-offs of distributed systems.

Scaling PostgreSQL is a journey. You should come out of this course more prepared to assess your scaling needs and understand how to scale reads and how to scale writes.

Each of this solution presented in this course will improve some aspect of the scalability topic, but each of them will add some complexity, and maybe some limitation or constraint.

We have to ask the right questions to get the system requirements, and this why we dedicated an entire lecture, so that we examine what questions we have to put ourself, before starting the Scaling Journey.

After this course, we should come out more prepared and understand how to scale reads.

We have several options for replication, depending on wether we favor performance or flexibility.

Replication can be used as a backup or a standby solution that would take over in case the main server crashes.

Replication can also be used to improve the performance of a software system by making it possible to distribute the load on several database servers.

Then, if we have one sort of replication in place, we could ask ourself if we want to allow several computers to serve the same data.

To achieve this, we should have a mechanism to distribute the requests. We’ll see here two of the most popular options available.

Next, if the number of database connections is great, then we’ll probably want to use a connection pooler. Again, we’ll cover two options here.

We’ll also see, how to scale writes, and how to make your traffic growth more predictable by adding queuing to your architecture.

Then, we’ll check partitioning for those cases when we have to deal with big tables.

Also, we’ll check sharding to scale writes, and all the complex decisions that come with it.

Finally, we’ll see shortly the multi-master solution, which is a relatively new concept that seems to be promising.

If our goal is to achieve only High availability, or the ability to continue working even in the situation where one part of the cluster fails, we can check out only those solutions.

The pre-requirements for HA is to put in place a replication strategy.

Then, we can use tools to allow a second server to take over quickly, if the primary server fails.

Introduction to Scaling PostgreSQL

Why scale PostgreSQL?
What is Vertical Scaling?
What is Horizontal Scaling?
Read Versus Write Bound Workloads
Why Statistics are essential?
How to enable and make us of Statistics? (Hands-on)
How to scale Postgres for Reads?
How replication helps to scale out?
What are the Load-Balancers?
How to scale Postgres for Writes?
How to make use of Queues?
How could Partitioning and Sharding help in scaling out?
What is the Multi-Master solution about?

Understanding the Limitations of Scaling out PostgreSQL

CAP Theorem Explained
PostgreSQL vs. Cassandra
Use case: CA Systems
Use case: AP Systems

How to use Streaming Replication?

What is Streaming Replication?
Asynchronous vs. Synchronous Replication
How to Initialise Primary Database? (Hands-on)
How to Configuring the Primary for Replication? (Hands-on)
How to Configuring the Replica Instance? (Hands-on)
Testing Replication Setup (Hands-on)

How to use Logical Replication?

What is Logical Replication in Postgres?
Step by step Logical Replication setup
How to setup the servers for Logical Replication? (Hands-on)
How to make a selective Copy of the Data? (Hands-on)
How to Create the Publication? (Hands-on)
How to Create the Subscription? (Hands-on)
Postgres Limitations of Logical Replication
How to Monitoring Logical Replication? (Hands-on)
Best use-cases for using Logical Replication

How to make use of PgBouncer?

What is PgBouncer?
Fundamental concepts of connection pooling
How to build a PgBouncer Setup? (Hands-on)
How to install and configure PgBouncer? (Hands-on)
How to create a basic configuration file for PgBouncer? (Hands-on)
How to connect to PgBouncer? (Hands-on)
Explaining Advanced Settings for Performance
Which are the available Pool Modes?
Executing a benchmark with PgBouncer (Hands-on)

How to scale PostgreSQL in Google Cloud?

Introduction
Key Components on Google Cloud
Key Characteristics of the Architecture
How to create PostgreSQL Instances on Google Cloud? (Hands-on)
How to create a Google Cloud Engine (GCE) for HAProxy? (Hands-on)
How to configure HAProxy for Load-Balancing? (Hands-on)
Testing Load-Balancing

How to make use of PostgreSQL Partitioning?

What is Partitioning?
Which Tables Need Partitioning?
How should the Tables be Partitioned?
Declarative vs. Inheritance Partitioning
How to create a Partitioned Table? (Hands-on)
Partitioning Methods

How to Shard PostgreSQL?

What is Sharding?
Pain-Points of Sharding?
What is Second Level Sharding?
What is good Sharding?
How to query across multiple Shards?

How to setup High Availability (HA) on PostgreSQL?

Why High Availability?
Steps to achieve High Availability
Essential Questions to ask before setting-up High Availability
Log-Shipping Replication
Streaming Replication and Logical Replication
Cascading Replication
Synchronous vs. Asynchronous Replication
Automatic Failover and Always-on Strategy
Simple HA Solution Example
Better HA Solution Example

How to make use of PgPool II?

What is PgPool II?
Pgpool-II Features
How to Configure Pgpool-II with Streaming Replication? (Hands-on)
How to setup Streaming Replication? (Hands-on)
How to Configuring Pgpool-II for Load Balancing ? (Hands-on)
Testing load-balancing & read/write separation (Hands-on)
How to Configure Pgpool for PostgreSQL High-Availability? (Hands-on)
How to Configure PostgreSQL Primary Server? (Hands-on)
How to Configure Pgpool-II Server? (Hands-on)
How to Configure PostgreSQL Replica Server? (Hands-on)
Testing The Failover (Hands-on)
How to restore failed nodes? (Hands-on)

Who this course is for:

Software Engineers interested in designing Scalable and HA solutions on top of PostgreSQL
Database Administrators
Everyone interested in building better PostgreSQL applications

PostgreSQL Replication, High Availability HA and Scalability

What you'll learn

Explore related topics

Course content

Scaling PostgreSQL17 lectures • 30min

Streaming Replication6 lectures • 16min

Logical Replication8 lectures • 25min

PgBouncer9 lectures • 19min

Scaling PostgreSQL with Google Cloud and HAProxy7 lectures • 21min

Partitioning6 lectures • 12min

Sharding5 lectures • 14min

PostgreSQL High Availability10 lectures • 19min

PgPool II12 lectures • 32min

Bonus Section1 lecture • 1min

Requirements

Description

Who this course is for: