The etcd server is the only stateful component of the Kubernetes cluster. Kubernetes stores all API objects and settings on the etcd server. etcd backup is enough to restore the Kubernetes cluster’s state completely. Kubernetes disaster recovery plans often include backing up the etcd cluster and using infrastructure as code to create new cloud servers.
In this blog, we will cover How to Backup & Restore the Etcd in Kubernetes. We have a set of Hands-on Labs that you must perform in order to learn Docker & Kubernetesand clear the CKA certification exam. Cluster Architecture, Installation & Configuration which includes etcd backup and restore, have a total weightage of 25% in the Exam.
What is Etcd?
Etcd is a distributed key-value store with high consistency that provides a secure mechanism to store data that must be accessible by a distributed system or cluster of machines. During network partitions, it gently conducts leader elections and can withstand machine failure, even in the master node.
Etcd is used in a variety of different applications. It is most famous for being the core datastore for Kubernetes, the de facto standard for container orchestration. Cloud-native apps that use etcd can have more continuous uptime and keep operating even when individual servers fail. Applications read and write to etcd, which distributes configuration data and provides redundancy and robustness for node configuration.
Kubernetes and Etcd
Etcd is the Kubernetes‘ primary datastore, that stores and duplicates all Kubernetes cluster states. Because etcd is such a vital component of a Kubernetes cluster, it’s critical that it’s configured and managed properly.
The cluster configuration of etcd can be challenging because it is a distributed consensus-based system. Bootstrapping, maintaining quorum, adjusting cluster membership, making backups, dealing with disaster recovery, and monitoring crucial events are all time-consuming and difficult operations that require specialist knowledge. In Etcd we can create backup & restore within the cluster and we can run this on the separate server for the HA cluster.
Pre Requisite
Make sure you have a K8s cluster deployed already.
Learn How To Setup A Three Node Kubernetes Cluster For CKA
Installing and Placing Etcd Binaries
Users mostly interact with etcd by putting or getting the value of a key. We do that by using etcdctl, a command line tool for interacting with etcd server. In this section, we are downloading the etcd binaries so that we have the etcdctl tool with us to interact.
1) Create a temporary directory for the ETCD binaries.
$ mkdir -p /tmp/etcd && cd /tmp/etcd

$ curl -s https://api.github.com/repos/etcd-io/etcd/releases/latest | grep browser_download_url | grep linux-amd64 | cut -d '"' -f 4 | wget -qi -
3) Unzip the compressed binaries:
$ tar xvf *.tar.gz

$ cd etcd-*/
$ mv etcd* /usr/local/bin/
$ cd ~
$ rm -rf /tmp/etcd
Find K8s Manifest Location
In Cluster we can check manifest default location with the help of the kubelet config file.
# cat /var/lib/kubelet/config.yaml
With this Manifest location, you can check the Kubernetes static pods location and find Api-server and ETCD pod location then under these pods you can check certificate file and data-dir location.
How to backup the Etcd & Restore it
The etcd server is the only stateful component of the Kubernetes cluster. Kuberenetes stores all API objects and settings on the etcd server.
Backing up this storage is enough to restore the Kubernetes cluster’s state completely.
Taking Snapshot and Verifying it:
1) Check backup Command flag which you need to include in the command
$ ETCDCTL_API=3 etcdctl snapshot backup -h

2) Take a snapshot of the etcd datastore using etcdctl:
$ sudo ETCDCTL_API=3 etcdctl snapshot save snapshot.db --cacert /etc/kubernetes/pki/etcd/ca.crt --cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key

3) View that the snapshot was successful:
$ sudo ETCDCTL_API=3 etcdctl snapshot status snapshot.db

Note: Important Note: If you are backing up and restoring the cluster do not run the status command after the backup this might temper the backup due to this restore process might fail.
Backing-up The Certificates And Key Files:
Zip up the contents of the etcd directory to save the certificates too
$ sudo tar -zcvf etcd.tar.gz /etc/kubernetes/pki/etcd
Restoring Etcd From Snapshot & Verify:
1) Check the present state of the cluster which is stored in present snapshot taken in above task:
$ kubectl get all

2) To verify now we will create a pod and as the new pod is not present in the snapshot will not be present when we restore the content back using restore command:
$ kubectl run testing-restore --image=nginx
$ kubectl get pods
Note: In the new Kubernetes version the generator flag is deprecated so use the above updated command to create a pod.

3) Check restore Command flag which you need to include in command
$ ETCDCTL_API=3 etcdctl snapshot restore -h

4) To restore we will have to first delete the present ETCD content. So lets look into and grab all the details we need for the restore command to execute
$ cat /etc/kubernetes/manifests/etcd.yaml

5) Will delete the present content of ETCD and execute the restore command
$ rm -rf /var/lib/etcd
$ ETCDCTL_API=3 etcdctl snapshot restore snapshot.db --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key --name=kubeadm-master --data-dir=/var/lib/etcd --initial-cluster=kubeadmmaster=https://10.0.0.4:2380 --initial-cluster-token=etcd-cluster-1 --initial-advertise-peerurls=https://10.0.0.4:2380

6) Verify that the cluster is back to status of which we had taken the snapshot
$ kubectl get pods
$ kubectl get all

Congratulations! We are now successfully done with the backup & restoration process of our ETCD cluster in Kubernetes.
Conclusion
For a single control plane arrangement, a Kubernetes cluster with infrequent API server changes is a fantastic alternative. Backups of the etcd cluster on a regular basis will reduce the time frame for potential data loss.