Here is the list of operations managed by CassKop at the Cluster level which have a dedicated status in each racks.
Those operations are applied at the Cassandra cluster level, as opposite to Pod operations that are executed at pod
level and are discussed in the next section.
Cluster Operations must only be triggered by a change made on the
Some updates in the
CassandraCluster CRD object are forbidden and will be gently dismissed by CassKop:
Some Updates in the
CassandraCluster CRD object will trigger a rolling update of the whole cluster such as :
Some Updates in the
CassandraCluster CRD object will not trigger change on the cluster but only in future behavior of
CassKop manages rolling updates for each statefulset in the cluster. Then each statefulset is making the rolling
updated of it's pod according to the
partition defined for each statefulset in
The First Operation required in a Cassandra Cluster is the initialization.
In this Phase, the CassKop will create the
CassandraCluster.Status section with an entry for each DC/Rack declared
We could also have Initializing status if we decided later to add some DC to our topology.
For demo we will create this CassandraCluster without topology section
topologyhas been specified, then CassKop creates the default topology and status.
The default topology added by CassKop is :
The number of cassandra nodes
CassandraCluster.spec.nodesPerRacks defines the number of cassandra nodes CassKop
must create in each of it's racks. In our example, there is only one default rack, so CassKop will only create 2
with the default topology there will be no Kubernetes NodesAffinity to spread the Cassandra nodes on the cluster. In this case, CassKop will only create one Rack and one DC for Cassandra. It is not recommended as you may lose data in case of hardware failure
When Initialization has ended you should have a Status similar to :
- The Status of the
- The Status of the Cluster is
- The phase is
Runningwhich means that each Rack has the desired amount of Nodes.
We asked 2
nodesPerRacks and we have one default rack, so we ended with 2 Cassandra nodes in our cluster.
seedlist has been initialized and stored in the CassandraCluster.status.seedlist`. It has also been
configured in each of the Cassandra Pods.
We can also confirm that Cassandra knows about the DC and Rack name we have deployed :
In this example, I added a topology defining 2 Cassandra DC and 3 racks in total
With this topology section I also references some Kubernetes nodes labels which will be used to spread the Cassandra nodes on each Racks on different groups of Kubernetes servers.
We can see here that we can give specific configuration for the number of pods in the dc2 (
We also allow to configure Cassandra pods with different num_tokens confioguration for each dc using the appropriate
parameter in the config.
CassKop will create a statefulset for each Rack, and start creating the Cassandra Cluster, starting by nodes from the Rack 1. When CassKop will end operations on Rack1, it will process the next rack and so on.
The status may be similar to :
The creation of the cluster is ongoing. We can see that, regarding the Cluster Topology, CassKop has created the SeedList.
CassKop compute a seedlist with 3 nodes in each datacenter (if possible). The Cassandra seeds are always the first Cassandra nodes of a statefulset (starting with index 0).
When all racks are in status done, then the
CassandraCluster.status.lastClusterActionStatus is changed to
We can see that internally Cassandra also knows the desired topology :
You can find in the cassandra-configuration section how you can use
Actually CassKop doesn't monitor changes inside the ConfigMap. If you want to change a parameter in a file in the current configMap, you must create a new configMap with the updated version, and then ask CassKop to use the new configmap name.
If we add/change/remove the
CassandraCluster.spec.configMapName then CassKop will start a RollingUpdate of each
CassandraNodes in each Racks, starting from the first Rack defined in the
First we need to create the configmap exemple:
Then we apply the changes in the
We can see the
CassandraCluster.Status updated by CassKop
CassKop won't make a rolling update on the next rack until the status of the current rack becomes
The Operation is processing "rack per rack".
CassKop allows you to change the Cassandra docker image and gracefully redeploy your whole cluster.
If we change the
CassandraCluster.spec.baseImage and or
CassandraCluster.spec.version CassKop will start to
perform a RollingUpdate on the whole cluster (for each racks sequentially, in order to change the version of the
Cassandra Docker Image on all nodes.
You can change the docker image used to :
- change the version of Cassandra
- change the version of Java
- Change some configuration parameters for cassandra or jvm if you don't overwrite them with a ConfigMap
The status may be similar to:
We can see that CassKop has started to Update the
dc1-rack1 and it has changed the
Once it has finished the first rack, then it processes the next one:
And when all racks are Done:
This provides a Central view to monitor what is happening on the Cassandra Cluster.
CassKop allows you to configure your Cassandra's pods resources (memory and cpu).
If we change the
CassandraCluster.spec.resources, then CassKop will start to make a RollingUpdate on the whole
cluster (for each racks sequentially) to change the version of the Cassandra Docker Image on all nodes.
For example, to increase Memory/CPU requests and/or limits:
Then CassKop should output the status:
We can see that it has staged the
UpdateResources action in all racks (
status=ToDo) and has started the action in
the first rack (
Done it will follow with next rack, and so on.
Upon completion, the status may look like :
The Scaling of the Cluster is managed through the nodesPerRacks parameters and through the number of Dcs and Racks defined in the Topology section.
See section NodesPerRacks
if the ScaleUp (or the ScaleDown) may change the SeedList and if
spec.autoUpdateSeedList is set to
then CassKop will program a new operation :
UpdateSeedList which will trigger a rollingUpdate to apply the new
seedlist on all nodes, once the Scaling is done.
CassKop allows you to Scale Up your Cassandra cluster.
There is a global parameter
CassandraCluster.spec.nodesPerRacks which specify the number of Cassandra nodes we want in
It is possible to surcharge this for a particular DC in the
In this case, we ask to ScaleUp nodes of second DC
CassKop takes into account the new target, and starts applying modifications in the cluster :
We can see that CassKop:
- has started the
- has found that the SeedList must be updated, and because the autoUpdateSeedList=true it has staged
status=Configuring) the UpdateSeedList operation for
When CassKop ends the ScaleUp action in the
dc2-rack1 then it will also stage this rack with
Once all racks are in this state, CassKop will turn each Rack in status
UpdateSeedList=ToDo, meaning that it can
start the operation.
Starting from then, CassKop will iterate on each rack one after the other and get status :
UpdateSeedList=Ongoingmeaning that it is currently doing a rolling update on the Rack to update the SeedList seting also sets the
UpdateSeedList=Donemeaning that the operation is done. (then, it sets the
See evolution of status:
Here is the final topology seen from nodetool :
Note that nodetool prints IP of nodes while kubernetes works with names :
After the ScaleUp has finished, CassKop must execute a cassandra
cleanup on each nodes of the Cluster.
This can be manually triggered by setting appropriate labels on each Pods.
CassKop can automate this if
spec.autoPilot is true by setting the labels on each Pods of the cluster with a ToDo
state and then find thoses pods to sequentially execute thoses actions.
See podOperation Cleanup!!
For ScaleDown, CassKop must perform a clean cassandra
decommission prior to actually scale down the cluster at
Actually, this is done through CassKop asking the decommission through a jolokia call and waiting for it to be performed (cassandra node status = decommissionned) before updating kubernetes statefulset (removing the pod).
If we ask to scale down more than 1 node at a time, then CassKop will iterate on a single scale down until it reaches the requested number of nodes.
Also CassKop will refuse a scaledown to 0 for a DC if there still have some data replicated to it.
To launch a ScaleDown, we simply need to decrease the value of nodesPerRacks.
We can see in the below example that:
- It has started the
- CassKop has found that the SeedList must be updated, and it has staged (
status=ToDo) it for
When CassKop completes the ScaleDown in the
dc2-rack1 then it will stage it also with
UpdateSeedList=ToDo Once all
racks are in this state, CassKop will turn each Rack in status
UpdateSeedList=Ongoing meaning that it can start the
operation, it also set the
Then, CassKop will iterate on each rack one after the other and get status :
UpdateSeedList=Finalizingmeaning that it is currently doing a rolling update on the Rack to update the SeedList
UpdateSeedList=Donemeaning that the operation is done. Then, it sets the
ScaleDown=Done CassKop will start the UpdateSeedList operation.
It shows also that
Done. CassKop will then rollingUpdate all racks one by one
in order to update the Cassandra seedlist.
The UpdateSeedList is done automatically by CassKop when the parameter
CassandraCluster.spec.autoUpdateSeedList is true (default).
CassandraCluster is used to define your cluster configuration. Some fields can't be updated in a kubernetes
clusters. Some fields are taken from the CRD to configure thoses objects, and to be sure we don't update them (to
prevent kubernetes objects in errors), we have configure CassKop to simply ignore/revert unauthorized changed to the
Example With this CRD deployed :
If we try to update the
dataStorageClass nothing will happen. And we could see thoses messages in
the logs of CassKop :
If you performed the modification by updating your local CRD file and apply it with kubectl you must revert to the old value.
- Prior to delete a DC, you must have ScaleDown to 0 all the Racks, if not, CassKop will refuse and correct the CRD.
- Prior to scaleDown to 0 CassKop will ensure that there are no more data replicated to the DC, if not, CassKop will refuse and correct the CRD. Because CassKop wants that we have the same amounts of pods in all racks, we decided that we would't allow to remove only a rack. This will be revert too.
You must ScaleDown to 0 before you remoove a DC You must change replication factor before doing a ScaleDown to 0 for a DC
In a normal production environment, CassKop will have spread it's Cassandra pods on differents k8s nodes. If the team in charge of the machines needs to make some operations on a host they can make a drain.
The Kubernetes drain command will ask the scheduler to make an eviction for all pods on the current nodes, and for many workloads k8s will reschedule them on other machines. In the case of CassKop cassandra pods, they won't be scheduled on another host, because they uses local-storage and are stick to a specific host thanks to the PersistentVolumeClaim kubernetes object.
Example: we drain the node008 for a maintenance operation.
All pods will be evicted, thoses who can will be rescheduled on another hosts. Our Cassandra pod won't we able to be schedule elsewhere due to the PVS, and we can see this messages in the k8s events :
It explain that 1 node is unshedulable, this is the one we just drain. the 5 other nodes can't be scheduled by our pod because they have volume node affinity conflict ()our pods have an affinity on node008).
Once the team have finished their maintenance operation they can bring back the host into the kubernetes cluster. From then, k8s will be able to reshedule back the cassandra pod into the cluster so that it can re-join the ring.
Immediately the pending pod is rescheduled and started on the host. If the time of interruption was not too long there is nothing more to do, the node will join the ring and re-synchronise with the cluster. If the time was too long, then it may be needed to schedule some PodOperations that you will find in nexts sections of this document.
If a k8s admin ask to drain a node, this may not been allowed by the cassandracluster regarding it's current state and the configuration of its PDB (usually only 1 nodes allowed to be in disruption).
The node008 will be flagged as SchedulingDisabled, so that it won't take new workload. It will evict all possible pods, but if there was an ongoing disruption on the current Cassandra cluster, it won't be allowed to evict the cassandra pod.
Example of a PDB :
In this example we see that we allowed only 1 Pod unavailable, and on our cluster we wants to have 14 pods and we only have 13 healthy, that's why the PDB won't allow the eviction of an additional pod.
To be able to continue, we need to wait or to make appropriate actions so that the Cassandra cluster won't have any unavailable nodes.
In the case of a major host failure, it may not be possible to bring back the node to life. We can in this case consider that our cassandra node is lost and we will want to replace it on another host.
In this case we may have 2 solutions that will require some manual actions :
- In this case we will use CassKop client to schedule a cassandra removenode for the failing node.
This will trigger the PodOperation removenode by setting the appropriate labels on a cassandra Pod.
- Once the node is properly removed, we can free the link between the Pod and the failing host by removing the associated PodDisruptionBudget
This will allow Kubernetes to reschedule the Pod on another free host.
- Once the node is back in the cluster we need to apply a cleanup on all nodes
you can pause the cleanup and check status with
In some cases It may be useful to prefer to replace the node. Because we use a statefulset to deploy cassandra pods, by definition all pods are identical and we couldn't execute specific actions on a specific node at startup.
For that CassKop provide the ability to execute a
pre_run.sh script that can be change using the CRD ConfigMap.
To see how to use the configmap see Overriding Configuration using configMap
for example If we want to replace the node cassandra-test-dc1-rack2-1, we first need to retrieve it's IP address from nodetool status for example :
Then we can edit the ConfigMap to edit the pre_run.sh script :
So the Operation will be :
- Edit the configmap with the appropriate CASSANDRA_REPLACE_NODE IP for the targeted pod name
- delete the pvc data-cassandra-test-dc1-rack2-1
- the Pod will boot, execute the pre_run.sh script prior to the /run.sh
- the new pod replace the dead one by re-syncing the content which could take some times depending on the data size.
- Do not forget to edit again the ConfigMap and to remove the specific line with replace_node instructions.