Standalone vs Distributed Mode

A free video tutorial from Stephane Maarek | AWS Certified Cloud Practitioner,Solutions Architect,Developer
Best Selling Instructor, 10x AWS Certified, Kafka Guru
Rating: 4.7 out of 5Instructor rating
65 courses
2,420,393 students
Standalone vs Distributed Mode

Lecture description

Learn about the two modes to launch Kafka Connect, Standalone mode and Distributed Mode, and their pros and cons

Learn more from the full course

Apache Kafka Series - Kafka Connect Hands-on Learning

Kafka Connect - Learn How to Source Twitter Data, Store in Apache Kafka Topics & Sink in ElasticSearch and PostgreSQL

04:23:59 of on-demand video • Updated March 2024

Configure and run Apache Kafka Source and Sink Connectors
Learn concepts behind Kafka Connect & the Kafka Connect architecture
Launch a Kafka Connect Cluster using Docker Compose
Deploy Kafka Connectors in Standalone and Distributed Mode
Write your own Kafka Connector
English [Auto]
So you have two ways of running your Kafka Connect workers, either in standalone or in distributed mode, and we will get to try out both in this course. We'll get to try standalone first and then we'll do distributed mode for the rest of the course. So standalone mode first, basically a single process. A single worker runs all your connectors and tasks. The configuration is bundled with your process and it's very easy to get started with. It's super useful when you're doing development and testing, when you're doing your own Kafka Connector, It's not fault tolerant. If that process fails or dies, you're left without a connector. It doesn't scale horizontally. At least you can scale vertically by having a better CPU, but that's it. And it's really hard to monitor because that's a single standalone lone process. It's very hard to monitor now distributed mode. You have multiple workers, their servers basically, and they run your connectors and your tasks. The configuration is not bundled with the workers. It's submitted using a rest API and we'll see how to use our rest API in details. It's super easy to scale. To scale, you just add workers. You just add more servers and automatically these new workers will retrieve tasks and execute them. And finally, it's fault tolerant. Basically, if a worker dies and we'll see in the next class, if a worker dies, all the tasks are rebalanced onto the available workers and your connectors can still can still go on. So it's really nice. You get fault tolerance, you get horizontal scalability. So all of that makes it really good, really useful for production, deployment of connectors. So remember, standalone mode is made for development and testing and distributed mode is made for production deployment of connectors.