
Discover how Cassandra delivers scalable, high-performance data management as a distributed NoSQL database, using a columnar data model with keyspaces and column families, and a Cassandra-specific SQL interface.
Cassandra manages huge datasets using it's columnar layout which is more efficient and saves space.
What are our requirements of a product catalog system and why do we need a distributed, columnar, de-centralized database to manage this?
What use cases does Cassandra work with? When would you use Cassandra over other databases?
How does Cassandra stack up against HBase? HBase is the columnar store available in the Hadoop eco-system.
Install and set up Cassandra on your machine.
Download and unzip the Cassandra cluster manager from GitHub, install it, then create a test cluster with CM create test using Cassandra version 3.6.
Install Maven by downloading the latest binary from Apache, unzip it, move the folder to a convenient location, and update your bash profile path so Maven commands are accessible.
If you are unfamiliar with softwares that require working with a shell/command line environment, this video will be helpful for you. It explains how to update the PATH environment variable, which is needed to set up most Linux/Mac shell based softwares.
Get started using the Cassandra Cluster Manager
Explore basic CM commands to view the Easybuy cluster status and inspect node details. Learn to list clusters, stop and show nodes, and remove a cluster when needed.
Cassandra does not have tables, it has column families instead!
Explore the super column family as a logical grouping within a column family, its lack of indexing and memory-intensive reads, and highlight composite keys and keyspaces.
Explore how Cassandra's data model maps to relational concepts, using keyspaces and nested maps to explain row keys and column families, while emphasizing a paradigm shift away from relational normalization.
Connect to Cassandra using the SQL command line and create a keyspace named catalog with simple replication and replication factor of three. Learn how replicas support fault tolerance across nodes.
All the configuration options available on a column family.
Modify a column family in Cassandra by adding a text column with alter column family and the add operator, and adjust gc grace seconds to tombstone data.
Insert data into a Cassandra column family via the SQL shell, detailing column value pairs, timestamps, TTLs, and the columnar storage layout with tombstones and select star verification.
Collections and counters allow you to store rich data in your column family
Update simple and collection data types in Cassandra by using where clauses and in operators for rows, and manage lists, sets, and maps with add, remove, replace operations and TTL.
Learn how to manage Cassandra roles: create roles with or without passwords, assign superuser status, and drop or list roles using the shell, with notes on password authentication.
Primary keys are made up of partition and clustering keys. Partition keys determine how data is distributed across a cluster.
Cassandra offers three partitioners: random (MD5) and murmur3 (default, faster) for uniform distribution, and the deprecated byte ordered partitioner for range queries, though it causes hotspots and load balancing issues.
Primary keys are made up of partition and clustering keys. Clustering keys determine how data is laid out on a single node.
The design of partition keys determine what queries are valid in your cluster. See the restrictions on queries based on partition keys.
The design of clustering keys determine what queries are valid in your cluster. See the restrictions on queries based on clustering keys.
Allow querying on additional columns by enabling secondary indexes. There are trade-offs when using this though!
Discover how secondary indexes change Cassandra query rules, requiring all partition keys or none. Learn limitations like no in, no order by, and how contains works with indexed collections.
Explore how allow filtering lets you run queries with restricted index columns by scanning the listings column family and filtering for product ID greater than a value, with unpredictable performance.
Explain Cassandra's write consistency levels and the hinted hand off mechanism, detailing how coordinator nodes enforce one, quorum, and all, with local quorum across data centers.
Explain read consistency levels in Cassandra, including one, all, quorum, and local_quorum, how the coordinator reads from the fastest replica, uses hashes, read repair, and cross data center considerations.
Explore how replication factor and quorum determine read and write consistency in Cassandra, using the formula quorum = ceil((sum of replication factors across data centers + 1)/2) and extremes.
Explore how Cassandra stores data on a node by examining mem tables, commit logs, and sstable components, including bloom filters, an index file, a summary file, and a data file.
Learn how an SSTable organizes data with an on-disk index file, a data file, a memory-resident summary, and in-memory row bloom filters to locate and read rows efficiently.
Explore Cassandra storage components, focusing on row cache and key cache, how they cache row data and primary key offsets, and their read-through behavior in memory.
Explore how Cassandra handles a write—from commit log and memtable updates to tombstones and row cache eviction. Understand memtables flush to SSTables and compaction merges tables and removes tombstones.
Understand how Cassandra reads a row via the row cache, bloom filters, key cache, and index, then merge ss and mem tables and leverage gossip protocol and hinted handoff.
Set up a Cassandra cluster with the cluster manager, run SQL-based queries, and build a Java mini CMS with three Maven modules—product, listing, and persistence—using the Datastax Cassandra Driver.
Learn to create a keyspace and run your first Cassandra query in Java by defining the query, obtaining a singleton session, and executing on the cluster.
Create a column family in a Cassandra keyspace using Java, mirroring SQL, by selecting the keyspace, obtaining a session, and executing the listings column family creation.
Learn to verify a created column family in Cassandra using the Java driver by obtaining a session and cluster, accessing keyspace metadata, and checking the column family existence.
Create a listing object with an attribute map and listing ID, then insert it into the CMS listings column family using a prepared statement built with the query builder.
Create a products column family in the cms key space with a three-column primary key (category, brand, product id) to support category and brand searches, then insert products.
Learn to query a Cassandra product column family by brand and categories using a product class and persistence handler, with prepared statement and query builder mapping results to product objects.
Delete a product listing from the Cassandra listings column family by using the listing persistence handler, a prepared delete statement, and a session to remove specified product IDs.
Use a logged batch in Cassandra to atomically update the title in both product and listings column families, leveraging batch log, two replicas, and prepared statements.
Taught by a team which includes 2 Stanford-educated, ex-Googlers and 2 ex-Flipkart Lead Analysts. This team has decades of practical experience in working with large-scale data processing.
Has your data gotten huge, unwieldy and hard to manage with a traditional database? Is your data unstructured with an expanding list of attributes? Do you want to ensure your data is always available even with server crashes? Look beyond Hadoop - the Cassandra distributed database is the solution to your problems.
Let's parse that.
What's included in this course: