Hello, readers! This article explains in detail the way to Run Apache Cassandra on Kubernetes with Statefulset with a demonstration through example.
So, let us begin! 🙂
Apache Cassandra – Overview
We have heard about a lot of types of databases. Be it structural or non-structural data, databases prove to be an effective way to have our data persistent and safe.
Apache Cassandra is one such database. It is a NoSQL database. That is, we are allowed to have our non-structural or document-based data persistent and safe.
It provides persistent storage to many containerized applications in Kubernetes.
As a part of the demonstration, we will be placing new Cassandra instances to be explored by the database for them to join the Cassandra cluster.
Note: The term ‘node’ is used in Cassandra as well as Kubernetes to refer to a member of a cluster. For the purpose of this demonstration, the pods running through the statefulset are a part of the Cassandra cluster representing the nodes. When they run as Pods on the Kubernetes architecture, these pods will be scheduled on the Kubernetes Nodes.
Prequisites in place!
Before getting started with the Deployment of Cassandra on Kubernetes infrastructure, we need to make sure that the below pre-requisites is in place-
- A Kubernetes Cluster with minimum two worker nodes.
- The kubectl command line tool synced and in place.
For the purpose of demonstration, we will be making use of Minikube to have a Kubernetes setup.
1. Creation of a headless service
At first, we would be needing a headless service to have the DNS lookups between the Cassandra Pods, client request, and the Kubernetes Cluster.
This service would help the Cassandra Pods assist with the tasks to perform.
apiVersion: v1 kind: Service metadata: labels: app: cassandra name: cassandra spec: clusterIP: None ports: - port: 9042 selector: app: cassandra
kubectl apply -f service.yaml
let us now verify the Cassandra service –
kubectl get svc cassandra
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE cassandra ClusterIP None <none> 9042/TCP 112m
2. Build a Cassandra ring with a Kubernetes Statefulset
Post creation of the service for lookup, it is now the time to create a Cassandra ring with Pods acting as nodes. We will be doing this through a Statefulset in Kubernetes.
apiVersion: apps/v1 kind: StatefulSet metadata: name: cassandra labels: app: cassandra spec: serviceName: cassandra replicas: 3 selector: matchLabels: app: cassandra template: metadata: labels: app: cassandra spec: terminationGracePeriodSeconds: 1800 containers: - name: cassandra image: gcr.io/google-samples/cassandra:v13 imagePullPolicy: Always ports: - containerPort: 7000 name: intra-node - containerPort: 7001 name: tls-intra-node - containerPort: 7199 name: jmx - containerPort: 9042 name: cql resources: limits: cpu: "500m" memory: 1Gi requests: cpu: "500m" memory: 1Gi securityContext: capabilities: add: - IPC_LOCK lifecycle: preStop: exec: command: - /bin/sh - -c - nodetool drain env: - name: MAX_HEAP_SIZE value: 512M - name: HEAP_NEWSIZE value: 100M - name: CASSANDRA_SEEDS value: "cassandra-0.cassandra.default.svc.cluster.local" - name: CASSANDRA_CLUSTER_NAME value: "K8Demo" - name: CASSANDRA_DC value: "DC1-K8Demo" - name: CASSANDRA_RACK value: "Rack1-K8Demo" - name: POD_IP valueFrom: fieldRef: fieldPath: status.podIP readinessProbe: exec: command: - /bin/bash - -c - /ready-probe.sh initialDelaySeconds: 15 timeoutSeconds: 5 # These volume mounts are persistent. They are like inline claims, # but not exactly because the names need to match exactly one of # the stateful pod volumes. volumeMounts: - name: cassandra-data mountPath: /cassandra_data # These are converted to volume claims by the controller # and mounted at the paths mentioned above. # do not use these in production until ssd GCEPersistentDisk or other ssd pd volumeClaimTemplates: - metadata: name: cassandra-data spec: accessModes: [ "ReadWriteOnce" ] storageClassName: fast resources: requests: storage: 1Gi --- kind: StorageClass apiVersion: storage.k8s.io/v1 metadata: name: fast provisioner: k8s.io/minikube-hostpath parameters: type: pd-ssd
This Statefulset triggers the creation of three pods in total within the Cassandra ring.
Let us now push the changes to the Kubernetes cluster-
kubectl apply -f Cassandra_statefulset.YAML
As soon as the Cassandra nodes start, it makes use of a seed list to discover as well as bootstrap other nodes that are present within the Cassandra cluster.
Validation of the Apache Cassandra Statefulset
Let us understand the implemented Statefulset for Cassandra through the below pointers-
- This demonstration makes use of a standard persistent backed disk for the storage class. They usually get converted to the volume claim by the backend controller.
- By default, we request for 1Gi storage with ReadWriteOnce permissions.
- The Statefulset contains the environment variables like the CASSANDRA_CLUSTER_NAME, the SEED service name to look for, etc.
- It creates three pods throught the Statefulset as a part of the Cassandra Ring.
kubectl get statefulset cassandra
NAME DESIRED CURRENT AGE cassandra 3 0 135m
At last, let us inspect the pods-
kubectl get pods -l="app=cassandra"
NAME READY STATUS RESTARTS AGE cassandra-0 1/1 Running 0 132m cassandra-1 1/1 Running 0 132m cassandra-2 1/1 Running 0 131m
This marks the end of this topic. Feel free to comment below, in case you come across any questions.
For more such posts related to Kubernetes, Stay tuned with us.
Till then, Happy Learning! 🙂