User Guide for Kubernetes#
Overview#
This guide introduces how to install and configure the ADPS agent, and how to properly use ADPS to back up and restore Kubernetes.
The backup and restore features supported by ADPS include:
Backup sources
Namespaces, workloads
Backup type
Full backup
Backup targets
Standard storage pool, deduplication storage pool, tape library pool, object storage pool, and LAN-free pool
Backup schedules
Immediate, one-time, hourly, daily, weekly, and monthly
Data processing
Data compression, data encryption, multiple channels, reconnection, speed limit, and replication
Restore types
Point-in-time restore
Restore targets
Original host, different host
Planning and preparation#
Before you install the agent, check the following prerequisites:
You have already installed and configured other backup components, including the backup server and the storage server.
You have created a user with roles of operator and administrator on the ADPS console. Log in to the console with this user to back up and restore the resource.
Note
The administrator role can install and configure agents, activate licenses, and authorize users. The operator role can create backup/restore jobs.
Install and configure the agent#
Verify the compatibility#
Before you install the agent, ensure that your Kubernetes version is on the Aurreum Data Protection Suite’s compatibility lists.
Kubernetes 1.17/1.18/1.19/1.20/1.21/1.22/1.23/1.26
Install the agent#
ADPS supports the backup and restore of Kubernetes clusters on Linux.
The agent can be installed through images. To install the agent, do the following:
Take the master node for example. If you cannot operate on the master node, copy the
admin.conf
file from the master node to the target node.scp /etc/kubernetes/admin.conf root@<node IP>:/etc/kubernetes/ echo "export KUBECONFIG=/etc/kubernetes/admin.conf" >> ~/.bash_profile source ~/.bash_profile
Load the image (all nodes require this step).
docker load -i agent-k8s-version.tar
Check the image.
sudo docker images| grep k8s # Or use the ctr images ls | grep k8s command
Create a namespace and specify the
hostid
.kubectl create ns backup uuidgen -r | sed "s/-//g" # Specify the hostid in the configmap by yourself
Configure the
cluster.yaml
as follows:apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: backup-k8s roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: cluster-admin subjects: - kind: ServiceAccount name: default namespace: backup kubectl apply -f cluster.yaml
Configure the agent as follows (You can deploy multiple agents to run multiple jobs. Agents can be switched among nodes. When you deploy multiple agents, they may be scheduled to the same node.)
apiVersion: apps/v1 kind: StatefulSet metadata: name: backup-agent namespace: backup spec: selector: matchLabels: app: backup-agent serviceName: backup-agent replicas: 1 # Replica number template: metadata: labels: app: backup-agent spec: containers: - env: - name: BACKUPD_HOST value: 172.16.30.197 - name: BACKUPD_PORT value: "50305" - name: HOSTID_0 # Specify the hostid from number 0 value: 9e26580745ab41fea2f80bf96e739186 - name: BACKUPD_SSL value: "false" - name: POD_IMAGE # Modify the value. You can use the docker images | grep pause command to check the value value: registry.aliyuncs.com/k8sxio/pause:3.2 - name: POD_NAME value: backup-pod - name: DEPLOY_METHOD value: statefulset - name: HOSTNAME valueFrom: fieldRef: fieldPath: metadata.name # - name: BACKUPD_ACCESS_KEY # value: "42fff1271225cf15198e55a886e78945" - name: BACKUP_NODE valueFrom: fieldRef: fieldPath: spec.nodeName # Modify the image name. You can use the docker images or ctr image ls command to check the image name image: registry.docker.aurreum.com/stable/focal/agent-k8s:version imagePullPolicy: IfNotPresent name: agent resources: {} securityContext: privileged: true volumeMounts: - mountPath: /var/log/adps name: log-volume - mountPath: /var/opt/aurreum/adps/agent name: opt-volume - mountPath: /var/lib/kubelet/pods mountPropagation: HostToContainer name: pods-path - mountPath: /dev mountPropagation: HostToContainer name: dev hostIPC: true hostNetwork: true hostPID: true #nodeName: k8s-master-85 # This field is not required by default, but if you use LAN-free pools, you must specify the node. #nodeSelector: # kubernetes.io/hostname: k8s-master-85 # This field is not required by default, but if you use LAN-free pools, you must specify the node. volumes: - hostPath: path: /var/lib/kubelet/pods name: pods-path - hostPath: path: /dev name: dev - hostPath: path: /opt/data/opt_volume name: opt-volume - hostPath: path: /opt/data/log_volume name: log-volume
After the installation, use the
kubectl get pod -n backup
command to check whether the agent is running or not (backup
is an example of the namespace. Change it according to your settings).[root@k8s-master-106 ~]# kubectl get pod -n backup NAME READY STATUS RESTARTS AGE backup-agent-0 1/1 Running 3 7h46m
Backup#
Backup types#
ADPS provides full backup for Kubernetes.
Full backup
Backs up all the data on Kubernetes and enough logs to recover the data.
Backup policies#
ADPS provides six backup schedule types: immediate, one-time, hourly, daily, weekly, and monthly.
Immediate: ADPS will immediately start the job after it is created.
One-time: ADPS will perform the job at the specified time once only.
Hourly: ADPS will perform the job periodically at the specified hour/minute intervals within the time range according to the setting.
Daily: ADPS will perform the job periodically at the specified time and day intervals.
Weekly: ADPS will perform the job periodically at the specified time and week intervals.
Monthly: ADPS will perform the job periodically at the specified dates and times.
You can set an appropriate backup policy based on your situation and requirements. Usually, we recommend the following common backup policy:
Perform a full backup once a week when the application traffic is relatively small (Example: on the weekend) to ensure that you have a recoverable point in time every week.
Before you begin#
Before you back up and restore Kubernetes, check the following:
Check the resource status.
(1) Click Resource > Resource. The Resource page appears.
(2) Check whether the host and the Kubernetes resource are on the page with an Online state. If they are offline, check whether the agent service and the Kubernetes service are running.
Check storage pools.
(1) From the menu, click Storage > Storage pool. The Storage pool page appears.
(2) Check whether the display area has any storage pools. If no, create a storage pool and authorize it for the current user. For details, see Add a storage pool in Aurreum Data Protection Suite Administrator’s Guide.
Note
If you want to use LAN-free pools, do the following:
Run the
modprobe iscsi_tcp
command to load the kernel of all hosts. You can use thelsmod | grep iscsi_tcp
command to check whether the loading is successful.Run the
/usr/sbin/iscsid
command in the agent pod.
Check the Kubernetes cluster status.
Here is the Linux command to check the Kubernetes cluster status.
(1) Use the
systemctl status kubelet
command to check whether the kubelet service is active (running).[root@k8s-master-106 ~]# systemctl status kubelet ● kubelet.service - kubelet: The Kubernetes Node Agent Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled) Drop-In: /usr/lib/systemd/system/kubelet.service.d └─10-kubeadm.conf Active: active (running) since Thu 2022-10-20 16:47:15 CST; 53min ago Docs: https://kubernetes.io/docs/ Main PID: 1169 (kubelet) Tasks: 29 Memory: 161.6M CGroup: /system.slice/kubelet.service └─1169 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --network-plugin...
(2) Use the
kubectl get nodes -owide
command to check the version of the Kubernetes cluster and whether all nodes are Ready. Ensure that the versions of all nodes are not older than v1.17.0. Otherwise, the CSI driver is not supported.[root@k8s-master-106 ~]# kubectl get nodes -owide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME k8s-master-106 Ready master 30d v1.19.5 172.16.12.106 <none> CentOS Linux 7 (Core) 5.4.219-1.el7.elrepo.x86_64 docker://19.3.11 k8s-node-107 Ready <none> 30d v1.19.5 172.16.12.107 <none> CentOS Linux 7 (Core) 5.4.219-1.el7.elrepo.x86_64 docker://19.3.11 k8s-node-108 Ready <none> 30d v1.19.5 172.16.12.108 <none> CentOS Linux 7 (Core) 5.4.219-1.el7.elrepo.x86_64 docker://19.3.11
(3) Use the
kubectl get pod -A
command to check whether all the pods in the Kubernetes cluster are Running.[root@k8s-master-106 ~]# kubectl get pod -A NAMESPACE NAME READY STATUS RESTARTS AGE backup backup-agent-0 1/1 Running 3 7h15m kube-system calico-kube-controllers-6c89d944d5-sgspg 1/1 Running 9 30d kube-system calico-node-442p9 1/1 Running 9 30d kube-system calico-node-hqlx4 1/1 Running 7 30d kube-system calico-node-srnvz 1/1 Running 7 30d kube-system coredns-59c898cd69-bcf7r 1/1 Running 9 30d kube-system coredns-59c898cd69-t97vc 1/1 Running 10 30d kube-system etcd-k8s-master-106 1/1 Running 9 30d kube-system kube-apiserver-k8s-master-106 1/1 Running 13 30d kube-system kube-controller-manager-k8s-master-106 1/1 Running 76 30d
(4) Use the
kubectl get sc
command to check whether the corresponding StorageClass of the CSI driver exists and whether the RECLAIMPOLICY of the StorageClass is Delete. If the RECLAIMPOLICY is not Delete, the temporary PV and volumes in Ceph created during backup cannot be deleted after the backup.[root@k8s-master-106 ~]# kubectl get sc NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE csi-rbd-sc rbd.csi.ceph.com Delete Immediate true 30d
(5) Use the
ceph -s
command to check whether the Ceph cluster status is normal and ensure that:the Ceph cluster has enough space
the Ceph version is later than v14.0
the kernel version is later than or equal to 5.1
[root@k8s-master-106 ~]# ceph -s cluster: id: 948d9908-dd20-4866-beea-e798e82f0252 health: HEALTH_OK services: mon: 1 daemons, quorum k8s-master-106 (age 24h) mgr: k8s-master-106(active, since 24h) osd: 3 osds: 3 up (since 24h), 3 in (since 5d) data: pools: 1 pools, 128 pgs objects: 825 objects, 2.3 GiB usage: 9.8 GiB used, 590 GiB / 600 GiB avail pgs: 128 active+clean io: client: 62 KiB/s wr, 0 op/s rd, 4 op/s wr
Note
Since Kubernetes v1.20, Kubernetes removes the
metadata.selfLink
by default but some applications still use it, such as nfs-client-provisioner. To use these applications, enablemetadata.selfLink
. Modify the/etc/kubernetes/manifests/kube-apiserver.yaml
file, add a line- --feature-gates=RemoveSelfLink=false
in the start parameter, and runkubectl apply -f kube-apiserver.yaml
.CSI reference: ceph-csi.git
Snapshot reference: external-snapshotter.git
Log in to the resource#
Before you create a backup or restore job, log in to the Kubernetes instance and authenticate the identity. You can use the access key of the current ADPS user to log in to the resource. This method is suitable for the following scenarios:
You cannot get the OS user’s username and password.
The user’s password changes frequently.
Note
Access key authentication is not enabled by default. To enable this feature, log in to the ADPS console, go to Settings, open the Security tab, and select the Access key login instance checkbox.
To get the access key, log in to the console, click Personal settings > Account settings on the upper right corner, find Access key on the Preferences tab, and click View.
To log in to the resource, do the following:
From the menu, click Resource > Resource. The Resource page appears.
From the host list, find the host where the K8s resource resides. If you have many hosts, use the search bar to find the host quickly. Click the host to expand its resource list.
Click Login beside the resource. The Login window appears.
In the Login window, enter the access key of the current ADPS user, and click Login.
If your information is correct, you will be prompted that you have logged in to the resource successfully.
Create a backup job#
To create a backup job, do the following:
From the menu, click Backup. The backup job wizard appears.
At the Hosts and resources step, select the host and the Kubernetes resource. The wizard goes to the next step automatically.
At the Backup source step, select Full backup. Click Next.
At the Backup target step, select a storage pool. Click Next.
At the Backup schedule step, set the job schedule. For details, see Backup policies. Click Next.
Select Immediate. ADPS performs the job immediately after it is created.
Select One time and set the start time for the job.
Select Hourly. Set the start time, end time, and time interval for job execution. The unit can be hour(s) or minute(s).
Select Daily. Set the start time and enter the time interval for job execution. The unit is day(s).
Select Weekly. Set the start time, enter the time interval, and select the specific dates in a week for job execution. The unit is week.
Select Monthly. Set the start time and months for job execution. You can select the natural dates in one month or select the specific dates in one week.
At the Backup options step, set the common and advanced options according to your needs. For details, see Backup options. Click Next.
At the Finish step, set the job name and confirm the job information. Click Submit.
After the submission, you will be redirected to the Job page automatically. On this page, you can start, modify, and delete the job.
Backup options#
ADPS provides the following backup options:
Common options
Option |
Description |
---|---|
Compression |
Fast is enabled by default. Backup data is compressed at the source side for transmission. It can reduce the backup time, improve backup efficiency, and save backup space. |
Channels |
It can improve backup efficiency. The default value is 1 and the value ranges from 1 to 255. |
Advanced options
Option |
Description |
---|---|
Reconnection time |
The value ranges from 1 to 60 minutes. The job continues after the abnormal reset occurs in the network within the set time. |
Speed limit |
Limits data transfer speed or disk read/write speed for different time periods. The unit can be KiB/s, MiB/s, and GiB/s. |
Precondition |
Checked before the job starts. The job execution will be aborted and the job state will be idle when the precondition is invalid. |
Pre-/Post-script |
The pre-script is executed after the job starts and before the resource is backed up. The post-script is executed after the resource is backed up. |
Restore#
ADPS provides point-in-time restore for Kubernetes.
Point-in-time restore
Restores Kubernetes to a specified point in time. The restore target can be the original host or a different host.
Before you begin#
To restore the instance to a different host, install the agent on that Kubernetes cluster, activate the licenses, and authorize user access to the resource.
Create a point-in-time restore job#
To create a point-in-time restore job, do the following:
From the menu, click Restore. The restore job wizard appears.
At the Hosts and resources step, select the host and the Kubernetes resource. The wizard goes to the next step automatically.
At the Backup sets step, do the following:
(1) From the Restore type list, select Point-in-time restore.
(2) In the Restore source section, select a backup set for the restore job.
(3) Click Next.
At the Restore target step, select a host and resource as the target. The wizard goes to the next step automatically.
At the Restore schedule step, set the job schedule. Click Next.
Select Immediate. ADPS will perform the job immediately after its creation.
Select One time and set the start time for the job.
At the Restore options step, set the options according to your needs. See Restore options. Click Next.
At the Finish step, set the job name and confirm the job information. Click Submit.
After the submission, you will be redirected to the Job page. You can start, modify, and delete the job.
Note
During restore, the data may be restored successfully while the job fails because the related configuration resources of the original service have conflicts. In this case, you need to create the service resources and modify the IP and port manually.
Restore options#
ADPS provides the following restore options:
Advanced options
Option |
Description |
---|---|
Reconnection time |
The value ranges from 1 to 60 minutes. The job continues after the abnormal reset occurs in the network within the set time. |
Resumption buffer size |
Specifies the resumption buffer size. The default value is 10 MiB. The bigger the resumption buffer size is, the more physical storage will be consumed. However, a bigger resumption buffer size can prevent data loss when the throughput of the business system is high. |
Speed limit |
Limits data transfer speed or disk read/write speed for different time periods. The unit can be KiB/s, MiB/s, and GiB/s. |
Precondition |
Checked before the job starts. The job execution will be aborted and the job state will be idle when the precondition is invalid. |
Pre-/Post-script |
The pre-script is executed after the job starts and before the resource is restored. The post-script is executed after the resource is restored. |
Limitations#
Feature |
Limitations |
---|---|
Backup |
1. Cannot back up volumes mounted on block devices. |
Environment |
1. The kernel of the Ceph cluster must be later than 5.1. |
Restore |
1. Cannot restore the Ceph-CSI driver. |
Glossary#
Term |
Description |
---|---|
CustomResourceDefinition |
Abbreviated as CRD. A claim for user to define the resource object. |
namespace |
Kubernetes (K8s) supports multiple virtual clusters which rely on the same physical cluster. These virtual clusters are called namespaces. By allocating resources to different namespaces, resources can be logically isolated and more easily managed for different uses. Resources in the same namespace must have unique names, while resources in different namespaces can share the same name. |
Master |
A node that is the main control component of K8s. |
Node |
A component in a K8s cluster which can be a physical or virtual machine. K8S |
Pod |
The smallest deployable units of computing that users can create and manage in Kubernetes. A Pod is a group of one or more containers. |
Service |
A method for exposing a network application that is running as one or more Pods in a cluster. |
NodePort |
A network service that is used to expose the application to the internet. |
StorageClass |
Provides a way for administrators to describe the classes of storage they offer. |
VolumeSnapshotClass |
Provides a way to describe the “classes” of storage when provisioning a volume snapshot. |
VolumeSnapshotContent |
A snapshot taken from a volume in the cluster that has been provisioned by an administrator. It is a resource in the cluster just like a PersistentVolume is a cluster resource. |
VolumeSnapshot |
A request for snapshot of a volume by a user. It is similar to a PersistentVolumeClaim. |
PersistentVolume |
Abbreviated as PV. A piece of storage in the cluster that has been provisioned by an administrator or dynamically provisioned using Storage Classes. |
PersistentVolumeClaim |
Abbreviated as PVC. A request for storage by a user. |
Secret |
An object that contains a small amount of sensitive data such as a password, a token, or a key. |
ConfigMap |
An API object used to store non-confidential data in key-value pairs |
ServiceAccount |
Provides an identity for processes that run in a Pod, and maps to a ServiceAccount object. |
LimitRange |
A policy to constrain the resource allocations (limits and requests) that you can specify for each applicable object kind (such as Pod or PersistentVolumeClaim) in a namespace. |
CSI |
Container Storage Interface. An industry standard interface rule created by community members including Kubernetes, Mesos, and Docker. CSI is a standard for exposing arbitrary block and file storage systems to containerized workloads on Container Orchestration Systems (COs). |
Clone |
Creates a copy of the existing Kubernetes volumes. The copy can be used as any other standard volume. Clone only supports CSI driver. |
ReplicationController |
Abbreviated as RC. It can create and manage a specified number of pod replicas. |
ReplicaSet |
Abbreviated as RS. It maintains a stable set of replica Pods running at any given time and can replace RC. |
Deployment |
Can be regarded as a superset of RC. In addition to providing functions such as Pod management, it also provides new features such as rollback and version recording. Generally, we do not create RC/RS directly, but create higher-level Deployment resources to automatically create RC/RS. |
ApiServer |
Provides HTTP Rest interfaces for creating, deleting, reading, updating, and monitoring various K8S resource objects (Pod, RC, Service, and so on), and is the data bus and data center of the entire system.ApiServer |
metadata |
This document uses metadata to define the resource data other than persistent volume storage data in namespaces. |
DaemonSet |
Ensures that all (or some) Nodes run a copy of a Pod. As nodes are added to the cluster, Pods are added to them. As nodes are removed from the cluster, those Pods are garbage collected. Deleting a DaemonSet will clean up the Pods it created. |
StatefulSet |
Manages the deployment and scaling of a set of Pods and provides guarantees about the ordering and uniqueness of these Pods. |
kubectl |
The command line tool (CLI) of Kubernetes. It is the required management tool for Kubernetes users and administrators. |