Friday, April 17, 2015

Cassandra Table Design

Our customer wants a table that has hundreds of thousands of rows stored in it per day. No problem, Cassandra handles it well. Our performance test shows that with a Replication Factor of 3, writes in a 5-node cluster take a touch less than 1ms and reads about 2ms per row. If you want to pull out more rows, then it's even better - 1000 rows takes about 400ms.

However, our customer wants to slice and dice the data, pivoting on an arbitrary set of columns, aggregating values as they go. Now, Cassandra is great at storing data but does not offer the richness you'd get in a SQL DB for retrieving subsets of it. You really want to search on primary keys. You can create indexes but they just create invisible tables for you. And even if you search on primary keys, you need the right permutation of keys to avoid WITH FILTERING.

So, our first stab got us creating way too many tables because of our customer requirements. The number of column families we can pivot on causes the number of tables to grow factorially. If our customer requires us to pivot on say 5 columns, that's a combination of


where n in our case is the 5 potential choices on offer and r is the number the user chooses.

Of course, our user can choose any number, from 1 to 5, so the number of permutations now looks like:


r = 1

Which is starting to cause the number of tables to balloon since

up to 3 choices means 7 tables
up to 4 choices means 15 tables
up to 5 choices means 31 tables

The number of rows also starts rising like:

r = 1

p = 1


where r and p are just indexes and dp is the number of distinct elements for a given dimension p.

Note: this is the worse case scenario. It depends on your data how many rows there will actually be. As it happens, since our data "clumps" around certain regions, our numbers were much smaller than this.

But this article has a very interesting way of querying the data for just what you want and using one table in which to do it. To leverage it, we need to pre-compute the aggregations. As it happens, we're using Apache Spark for that then dumping it into Cassandra.