From 0 to 1: The Cassandra Distributed Database

A complete guide to getting started with cluster management and queries on Cassandra

Created byLoony Corn

Last updated 10/2016

English

What you'll learn

Set up a cluster, keyspaces, column families and manage them
Run queries using the CQL command shell
Design primary keys and secondary indexes with partitioning and clustering considerations
Use the Cassandra Java driver to connect and run queries on the cluster

Course content

10 sections • 46 lectures • 5h 54m total length

You, This Course and Us1:45
Discover how Cassandra delivers scalable, high-performance data management as a distributed NoSQL database, using a columnar data model with keyspaces and column families, and a Cassandra-specific SQL interface.

A Column-Oriented Database10:39
Cassandra manages huge datasets using it's columnar layout which is more efficient and saves space.
Requirements For A Product Catalog System8:07
What are our requirements of a product catalog system and why do we need a distributed, columnar, de-centralized database to manage this?
What Is Cassandra?8:33
What use cases does Cassandra work with? When would you use Cassandra over other databases?
Cassandra Vs HBase4:37
How does Cassandra stack up against HBase? HBase is the columnar store available in the Hadoop eco-system.

Install Cassandra (Mac and Unix based systems)4:34
Install and set up Cassandra on your machine.
Install the Cassandra Cluster Manager (Mac and Unix)2:20
Download and unzip the Cassandra cluster manager from GitHub, install it, then create a test cluster with CM create test using Cassandra version 3.6.
Install Maven On Your Machine2:20
Install Maven by downloading the latest binary from Apache, unzip it, move the folder to a convenient location, and update your bash profile path so Maven commands are accessible.
[For Linux/Mac OS Shell Newbies] Path and other Environment Variables8:25
If you are unfamiliar with softwares that require working with a shell/command line environment, this video will be helpful for you. It explains how to update the PATH environment variable, which is needed to set up most Linux/Mac shell based softwares.

Columns And Column Families8:02
Cassandra does not have tables, it has column families instead!
Super Column Family And Keyspace7:17
Explore the super column family as a logical grouping within a column family, its lack of indexing and memory-intensive reads, and highlight composite keys and keyspaces.
Comparing Cassandra With A Relational Database4:19
Explore how Cassandra's data model maps to relational concepts, using keyspaces and nested maps to explain row keys and column families, while emphasizing a paradigm shift away from relational normalization.

Connect To Cassandra And Create A Keyspace6:54
Connect to Cassandra using the SQL command line and create a keyspace named catalog with simple replication and replication factor of three. Learn how replicas support fault tolerance across nodes.
Column Families And Their Properties12:02
All the configuration options available on a column family.
Modify Column Families2:42
Modify a column family in Cassandra by adding a text column with alter column family and the add operator, and adjust gc grace seconds to tombstone data.
Insert Data Into A Column Family6:52
Insert data into a Cassandra column family via the SQL shell, detailing column value pairs, timestamps, TTLs, and the columnar storage layout with tombstones and select star verification.
Advanced Data Types: Collections And Counters10:56
Collections and counters allow you to store rich data in your column family
Update Simple And Collection Data Types15:54
Update simple and collection data types in Cassandra by using where clauses and in operators for rows, and manage lists, sets, and maps with add, remove, replace operations and TTL.
Manage Cluster Roles5:01
Learn how to manage Cassandra roles: create roles with or without passwords, assign superuser status, and drop or list roles using the shell, with notes on password authentication.

Partition Keys: Distributing Data Across Cluster Nodes12:14
Primary keys are made up of partition and clustering keys. Partition keys determine how data is distributed across a cluster.
Partition Keys: Properties5:08
Cassandra offers three partitioners: random (MD5) and murmur3 (default, faster) for uniform distribution, and the deprecated byte ordered partitioner for range queries, though it causes hotspots and load balancing issues.
Clustering Keys: Data Layout On A Node3:36
Primary keys are made up of partition and clustering keys. Clustering keys determine how data is laid out on a single node.
Restrictions On Partition Keys14:38
The design of partition keys determine what queries are valid in your cluster. See the restrictions on queries based on partition keys.
Restrictions On Clustering Keys9:12
The design of clustering keys determine what queries are valid in your cluster. See the restrictions on queries based on clustering keys.
Secondary Indexes8:32
Allow querying on additional columns by enabling secondary indexes. There are trade-offs when using this though!
Restrictions On Secondary Indexes8:52
Discover how secondary indexes change Cassandra query rules, requiring all partition keys or none. Learn limitations like no in, no order by, and how contains works with indexed collections.
Allow Filtering2:27
Explore how allow filtering lets you run queries with restricted index columns by scanning the listings column family and filtering for product ID greater than a value, with unpredictable performance.

Write Consistency Levels And Hinted Handoff12:18
Explain Cassandra's write consistency levels and the hinted hand off mechanism, detailing how coordinator nodes enforce one, quorum, and all, with local quorum across data centers.
Read Consistency Levels11:18
Explain read consistency levels in Cassandra, including one, all, quorum, and local_quorum, how the coordinator reads from the fastest replica, uses hashes, read repair, and cross data center considerations.
Replication Factors And Quorum Value8:14
Explore how replication factor and quorum determine read and write consistency in Cassandra, using the formula quorum = ceil((sum of replication factors across data centers + 1)/2) and extremes.

Overview Of Cassandra Storage Components6:38
Explore how Cassandra stores data on a node by examining mem tables, commit logs, and sstable components, including bloom filters, an index file, a summary file, and a data file.
The SSTable And Its Components9:44
Learn how an SSTable organizes data with an on-disk index file, a data file, a memory-resident summary, and in-memory row bloom filters to locate and read rows efficiently.
Row Cache And Key Cache3:14
Explore Cassandra storage components, focusing on row cache and key cache, how they cache row data and primary key offsets, and their read-through behavior in memory.
Anatomy Of A Write Request8:32
Explore how Cassandra handles a write—from commit log and memtable updates to tombstones and row cache eviction. Understand memtables flush to SSTables and compaction merges tables and removes tombstones.
Anatomy Of A Read Request And The Gossip Protocol7:25
Understand how Cassandra reads a row via the row cache, bloom filters, key cache, and index, then merge ss and mem tables and leverage gossip protocol and hinted handoff.

Overview And Basic Setup4:28
Set up a Cassandra cluster with the cluster manager, run SQL-based queries, and build a Java mini CMS with three Maven modules—product, listing, and persistence—using the Datastax Cassandra Driver.
Create A Session And Execute Our First Query7:39
Learn to create a keyspace and run your first Cassandra query in Java by defining the query, obtaining a singleton session, and executing on the cluster.
Create A Column Family3:27
Create a column family in a Cassandra keyspace using Java, mirroring SQL, by selecting the keyspace, obtaining a session, and executing the listings column family creation.
Check If A Column Family Has Been Created4:59
Learn to verify a created column family in Cassandra using the Java driver by obtaining a session and cluster, accessing keyspace metadata, and checking the column family existence.
Insert Data Into The Listings Column Family9:13
Create a listing object with an attribute map and listing ID, then insert it into the CMS listings column family using a prepared statement built with the query builder.
Insert Data Into The Products Column Family9:59
Create a products column family in the cms key space with a three-column primary key (category, brand, product id) to support category and brand searches, then insert products.
Search For Products13:32
Learn to query a Cassandra product column family by brand and categories using a product class and persistence handler, with prepared statement and query builder mapping results to product objects.
Delete A Listing4:17
Delete a product listing from the Cassandra listings column family by using the listing persistence handler, a prepared delete statement, and a session to remove specified product IDs.
Update Mulitple Column Families Using Logged Batch14:42
Use a logged batch in Cassandra to atomically update the title in both product and listings column families, leveraging batch log, two replicas, and prepared statements.

Requirements

The basics of SQL and traditional relational databases
The basics of Java in order to use the Cassandra Java library

Description

Taught by a team which includes 2 Stanford-educated, ex-Googlers and 2 ex-Flipkart Lead Analysts. This team has decades of practical experience in working with large-scale data processing.

Has your data gotten huge, unwieldy and hard to manage with a traditional database? Is your data unstructured with an expanding list of attributes? Do you want to ensure your data is always available even with server crashes? Look beyond Hadoop - the Cassandra distributed database is the solution to your problems.

Let's parse that.

Huge, unwieldy data: This course helps your set up a cluster with multiple nodes to distribute data across machines

Unstructured: Cassandra is a columnar store. There are no empty cells or space wasted when you store data with variable and expanding attributes

Always available: Cassandra uses partitioning and replication to ensure that your data is available even when nodes in a cluster go down

What's included in this course:

The Cassandra Cluster Manager (CCM) to set up and manage your cluster
The Cassandra Query Language (CQL) to create keyspaces, column families, perform CRUD operations on column families and other administrative tasks
Designing primary keys and secondary indexes, partitioning and clustering keys
Restrictions on queries based on primary and secondary key design
Tunable consistency using quorum and local quorum. Read and write consistency in a node
Architecture and Storage components: Commit Log, MemTable, SSTables, Bloom Filters, Index File, Summary File and Data File
A real world project: A Miniature Catalog Management System using the Cassandra Java driver

Who this course is for:

Yup! Engineers and analysts who understand traditional, relational databases and want to move to big data storage systems
Nope! Students who are just starting out understanding databases and have no prior experience with one

From 0 to 1: The Cassandra Distributed Database

What you'll learn

Explore related topics

Course content

You, This Course and Us1 lecture • 2min

Introduction: Cassandra as a distributed, decentralized, columnar store4 lectures • 32min

Install And Set Up4 lectures • 18min

The Cassandra Cluster Manager2 lectures • 19min

The Cassandra Data Model3 lectures • 20min

Shell Commands7 lectures • 1hr

Keys And Indexes: Primary Keys, Partition Keys, Clustering Key, Secondary Indexe8 lectures • 1hr 5min

Tunable Consistency3 lectures • 32min

Storage Systems5 lectures • 36min

A Mini-Project: A Miniature Catalog Management System In Java9 lectures • 1hr 12min

Requirements

Description

Who this course is for: