Run Apache Cassandra on Kubernetes with Statefulset

Filed Under: Random
Run Apache Cassandra On Kubernetes With Statefulset

Hello, readers! This article explains in detail the way to Run Apache Cassandra on Kubernetes with Statefulset with a demonstration through example.

So, let us begin! 馃檪


Apache Cassandra – Overview

We have heard about a lot of types of databases. Be it structural or non-structural data, databases prove to be an effective way to have our data persistent and safe.

Apache Cassandra is one such database. It is a NoSQL database. That is, we are allowed to have our non-structural or document-based data persistent and safe.

It provides persistent storage to many containerized applications in Kubernetes.

As a part of the demonstration, we will be placing new Cassandra instances to be explored by the database for them to join the Cassandra cluster.

Note: The term ‘node’ is used in Cassandra as well as Kubernetes to refer to a member of a cluster. For the purpose of this demonstration, the pods running through the statefulset are a part of the Cassandra cluster representing the nodes. When they run as Pods on the Kubernetes architecture, these pods will be scheduled on the Kubernetes Nodes.


Prequisites in place!

Before getting started with the Deployment of Cassandra on Kubernetes infrastructure, we need to make sure that the below pre-requisites is in place-

  1. A Kubernetes Cluster with minimum two worker nodes.
  2. The kubectl command line tool synced and in place.

For the purpose of demonstration, we will be making use of Minikube to have a Kubernetes setup.


1. Creation of a headless service

At first, we would be needing a headless service to have the DNS lookups between the Cassandra Pods, client request, and the Kubernetes Cluster.

This service would help the Cassandra Pods assist with the tasks to perform.

Service.YAML

apiVersion: v1
kind: Service
metadata:
  labels:
    app: cassandra
  name: cassandra
spec:
  clusterIP: None
  ports:
  - port: 9042
  selector:
    app: cassandra
kubectl apply -f service.yaml

let us now verify the Cassandra service –

kubectl get svc cassandra

Output:

NAME        TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)    AGE
cassandra   ClusterIP   None         <none>        9042/TCP   112m

2. Build a Cassandra ring with a Kubernetes Statefulset

Post creation of the service for lookup, it is now the time to create a Cassandra ring with Pods acting as nodes. We will be doing this through a Statefulset in Kubernetes.

Example: Cassandra_statefulset.YAML

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: cassandra
  labels:
    app: cassandra
spec:
  serviceName: cassandra
  replicas: 3
  selector:
    matchLabels:
      app: cassandra
  template:
    metadata:
      labels:
        app: cassandra
    spec:
      terminationGracePeriodSeconds: 1800
      containers:
      - name: cassandra
        image: gcr.io/google-samples/cassandra:v13
        imagePullPolicy: Always
        ports:
        - containerPort: 7000
          name: intra-node
        - containerPort: 7001
          name: tls-intra-node
        - containerPort: 7199
          name: jmx
        - containerPort: 9042
          name: cql
        resources:
          limits:
            cpu: "500m"
            memory: 1Gi
          requests:
            cpu: "500m"
            memory: 1Gi
        securityContext:
          capabilities:
            add:
              - IPC_LOCK
        lifecycle:
          preStop:
            exec:
              command: 
              - /bin/sh
              - -c
              - nodetool drain
        env:
          - name: MAX_HEAP_SIZE
            value: 512M
          - name: HEAP_NEWSIZE
            value: 100M
          - name: CASSANDRA_SEEDS
            value: "cassandra-0.cassandra.default.svc.cluster.local"
          - name: CASSANDRA_CLUSTER_NAME
            value: "K8Demo"
          - name: CASSANDRA_DC
            value: "DC1-K8Demo"
          - name: CASSANDRA_RACK
            value: "Rack1-K8Demo"
          - name: POD_IP
            valueFrom:
              fieldRef:
                fieldPath: status.podIP
        readinessProbe:
          exec:
            command:
            - /bin/bash
            - -c
            - /ready-probe.sh
          initialDelaySeconds: 15
          timeoutSeconds: 5
        # These volume mounts are persistent. They are like inline claims,
        # but not exactly because the names need to match exactly one of
        # the stateful pod volumes.
        volumeMounts:
        - name: cassandra-data
          mountPath: /cassandra_data
  # These are converted to volume claims by the controller
  # and mounted at the paths mentioned above.
  # do not use these in production until ssd GCEPersistentDisk or other ssd pd
  volumeClaimTemplates:
  - metadata:
      name: cassandra-data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: fast
      resources:
        requests:
          storage: 1Gi
---
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: fast
provisioner: k8s.io/minikube-hostpath
parameters:
  type: pd-ssd

This Statefulset triggers the creation of three pods in total within the Cassandra ring.

Let us now push the changes to the Kubernetes cluster-

kubectl apply -f Cassandra_statefulset.YAML

As soon as the Cassandra nodes start, it makes use of a seed list to discover as well as bootstrap other nodes that are present within the Cassandra cluster.


Validation of the Apache Cassandra Statefulset

Let us understand the implemented Statefulset for Cassandra through the below pointers-

  1. This demonstration makes use of a standard persistent backed disk for the storage class. They usually get converted to the volume claim by the backend controller.
  2. By default, we request for 1Gi storage with ReadWriteOnce permissions.
  3. The Statefulset contains the environment variables like the CASSANDRA_CLUSTER_NAME, the SEED service name to look for, etc.
  4. It creates three pods throught the Statefulset as a part of the Cassandra Ring.
kubectl get statefulset cassandra

Output:

NAME        DESIRED   CURRENT   AGE
cassandra   3         0         135m

At last, let us inspect the pods-

kubectl get pods -l="app=cassandra"

Output-

NAME          READY     STATUS    RESTARTS   AGE
cassandra-0   1/1       Running   0          132m
cassandra-1   1/1       Running   0          132m
cassandra-2   1/1       Running   0          131m

Conclusion

This marks the end of this topic. Feel free to comment below, in case you come across any questions.

For more such posts related to Kubernetes, Stay tuned with us.

Till then, Happy Learning! 馃檪

Leave a Reply

Your email address will not be published. Required fields are marked *

close
Generic selectors
Exact matches only
Search in title
Search in content