June 27, 2017

Instaclustr adds new dynamic resizing feature for Apache Cassandra

By Ben Slater

First published by instaclustr.com on June 26 2017


Apache Cassandra is well known for its ability to scale to whatever size you could possibly need. However, the traditional approach of adding or removing cluster nodes to change cluster capacity can be resource intensive and time-consuming, taking hours or even days to scale a cluster this way.

Instaclustr’s new dynamic resizing capability dramatically changes this picture by allowing you to scale up or down the processing capacity of a Cassandra cluster, online, in minutes, with the click of a couple of buttons or a single API call. Our benchmarking below demonstrates a tenfold improvement in cluster throughput being achieved in under 90 minutes. This process can be reversed in the same amount of time to dramatically reduce your infrastructure costs.

This facility provides a huge step forward for organisations that want to efficiently use the proven, open source power of Apache Cassandra while providing flexible processing capacity to allow for patterns such as:

  • Weekly or daily large batch analytic processing;
  • Periodic large scale data ingestion;
  • Peak processing requirements associated with promotions or big events (e.g. Super Bowl, Black Friday); and
  • Meeting varying demand across different times of the day.

We have been able to deliver this capability by building on the flexibility of the AWS environment, the sophisticated monitoring and provisioning capability of our managed service and the inherent capability of Apache Cassandra to deal with nodes being taken offline for maintenance without skipping a beat. At a high level the process works as follows:

  1. Cluster health is checked Instaclustr’s monitoring system including synthetic transactions.
  2. The cluster’s schema is checked to ensure it is configured for the required redundancy for the operation.
  3. Cassandra on the node is stopped, and the AWS instance associated with the node is switch to a smaller or larger size.
  4. Cassandra is restarted. As EBS is used for the data volumes, no data is lost and no restreaming of data is necessary,
  5. Monitor the cluster to wait until all nodes have come up cleanly and have been processing transactions for at least one minute (again, using our synthetic transaction monitoring) and then move on to the next nodes.

Nodes can be resized one at a time, or concurrently.  Concurrent resizing allows up to one rack at a time to be replaced.

This is a very similar operation to the one that our tech-ops team has performed routinely hundreds, if not thousands, of times to perform patching, upgrades and other maintenance on customer clusters. So, we’re very confident that a properly configured application can keep using Cassandra throughout this operation without missing a beat. That said, “properly configured’ is important so we highly recommended testing your application against this function before using it in production.

Click here for more details and to read the full article by Instaclustr.