- Learning Apache Cassandra
- Mat Brown
- 423字
- 2025-04-04 21:09:03
Compound keys represent parent-child relationships
In the What Cassandra offers, and what it doesn't section of Chapter 1, Getting Up and Running with Cassandra, you learned that Cassandra is not a relational database, despite some surface similarities. Specifically, this means that Cassandra does not have a built-in concept of the relationships between data in different tables. There are no foreign key constraints and there's no JOIN clause available in the SELECT
statements; in fact, there is no way to read from multiple tables in the same query. Whereas relational databases are designed to explicitly account for the relationships between data in different tables, whether they're one-to-one, one-to-many, or many-to-many. Cassandra has no built-in mechanism for describing or traversing inter-table relationships.
That being said, Cassandra's compound primary key structure provides an ample affordance for a particular kind of relationship—the parent-child relationship. This is a specific type of one-to-many relationship in which the "one" side plays a unique role with respect to the "many" side; we can say that the "one" is a parent or a container for the "many". We've already seen two examples of this: a user's status updates are children of the user themselves; and the comments about a status update are children of that status update.
This relationship is represented quite transparently in the compound primary key structure. The partition key acts as a reference to the parent, and the clustering column uniquely identifies the row among its siblings. This is why we used both the status_update_username
and status_update_id
columns for the partition key in our status_update_replies
table; these columns together provide a full reference to the reply's parent, namely the status update to which it's a reply.
It's worth emphasizing that not every one-to-many relationship is a parent-child relationship. For instance, on a blogging platform, we'd expect a blog post to have at least a couple of many-to-one relationships, namely, an author relation and a blog relation. Only one of these can be a parent-child relationship; in the blog example, it seems natural to think of the parent of a blog post as the blog.
Our Cassandra data models can only accommodate a single parent relation for a given table because the parent relation is expressed as the partition key column(s) of the table. Not all table schemas fit this line of reasoning; sometimes a partition key is just a partition key, such as a time-series table that partitions by date. However, parent-child relationships provide a fruitful framework for Cassandra data modeling across a wide variety of applications.