User Guide for Kubernetes#

Overview#

This guide introduces how to install and configure the ADPS agent, and how to properly use ADPS to back up and restore Kubernetes.

The backup and restore features supported by ADPS include:

Backup sources

Namespaces, workloads
Backup type

Full backup
Backup targets

Standard storage pool, deduplication storage pool, tape library pool, object storage pool, and LAN-free pool
Backup schedules

Immediate, one-time, hourly, daily, weekly, and monthly
Data processing

Data compression, data encryption, multiple channels, reconnection, speed limit, and replication
Restore types

Point-in-time restore
Restore targets

Original host, different host

Planning and preparation#

Before you install the agent, check the following prerequisites:

You have already installed and configured other backup components, including the backup server and the storage server.
You have created a user with roles of operator and administrator on the ADPS console. Log in to the console with this user to back up and restore the resource.

Note

The administrator role can install and configure agents, activate licenses, and authorize users. The operator role can create backup/restore jobs.

Install and configure the agent#

Verify the compatibility#

Before you install the agent, ensure that your Kubernetes version is on the Aurreum Data Protection Suite’s compatibility lists.

Kubernetes 1.17/1.18/1.19/1.20/1.21/1.22/1.23/1.26

Install the agent#

ADPS supports the backup and restore of Kubernetes clusters on Linux.

The agent can be installed through images. To install the agent, do the following:

Take the master node for example. If you cannot operate on the master node, copy the admin.conf file from the master node to the target node.

scp /etc/kubernetes/admin.conf root@<node IP>:/etc/kubernetes/
echo "export KUBECONFIG=/etc/kubernetes/admin.conf" >> ~/.bash_profile
source ~/.bash_profile

Load the image (all nodes require this step).
```
docker load -i agent-k8s-version.tar
```

Check the image.

sudo docker images| grep k8s # Or use the ctr images ls | grep k8s command

Create a namespace and specify the hostid.

kubectl create ns backup

uuidgen -r | sed "s/-//g"   # Specify the hostid in the configmap by yourself

Configure the cluster.yaml as follows:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: backup-k8s
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- kind: ServiceAccount
  name: default
  namespace: backup


kubectl apply -f cluster.yaml

Configure the agent as follows (You can deploy multiple agents to run multiple jobs. Agents can be switched among nodes. When you deploy multiple agents, they may be scheduled to the same node.)

apiVersion: apps/v1
kind: StatefulSet
metadata:
name: backup-agent
namespace: backup
spec:
selector:
  matchLabels:
    app: backup-agent
serviceName: backup-agent
replicas: 1 # Replica number
template:
metadata:
  labels:
    app: backup-agent
spec:
  containers:
    - env:
        - name: BACKUPD_HOST
          value: 172.16.30.197
        - name: BACKUPD_PORT
          value: "50305"
        - name: HOSTID_0  # Specify the hostid from number 0
          value: 9e26580745ab41fea2f80bf96e739186
        - name: BACKUPD_SSL
          value: "false"
        - name: POD_IMAGE
          # Modify the value. You can use the docker images | grep pause command to check the value
          value: registry.aliyuncs.com/k8sxio/pause:3.2
        - name: POD_NAME
          value: backup-pod
        - name: DEPLOY_METHOD
          value: statefulset
        - name: HOSTNAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
      # - name: BACKUPD_ACCESS_KEY
      #   value: "42fff1271225cf15198e55a886e78945"
      - name: BACKUP_NODE
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
      # Modify the image name. You can use the docker images or ctr image ls command to check the image name
      image: registry.docker.aurreum.com/stable/focal/agent-k8s:version
      imagePullPolicy: IfNotPresent
      name: agent
      resources: {}
      securityContext:
        privileged: true
      volumeMounts:
      - mountPath: /var/log/adps
          name: log-volume
        - mountPath: /var/opt/aurreum/adps/agent
          name: opt-volume
        - mountPath: /var/lib/kubelet/pods
          mountPropagation: HostToContainer
          name: pods-path
        - mountPath: /dev
          mountPropagation: HostToContainer
          name: dev
  hostIPC: true
  hostNetwork: true
  hostPID: true
  #nodeName: k8s-master-85  # This field is not required by default, but if you use LAN-free pools, you must specify the node.
  #nodeSelector:
  #     kubernetes.io/hostname: k8s-master-85  # This field is not required by default, but if you use LAN-free pools, you must specify the node.
  volumes:
    - hostPath:
        path: /var/lib/kubelet/pods
      name: pods-path
    - hostPath:
        path: /dev
      name: dev
    - hostPath:
        path: /opt/data/opt_volume
      name: opt-volume
    - hostPath:
        path: /opt/data/log_volume
      name: log-volume

After the installation, use the kubectl get pod -n backup command to check whether the agent is running or not (backup is an example of the namespace. Change it according to your settings).

[root@k8s-master-106 ~]# kubectl get pod -n backup
NAME             READY   STATUS    RESTARTS   AGE
backup-agent-0   1/1     Running   3          7h46m

Activate licenses and authorize users#

After the agent installation, go back to the Resource page. The host with the agent installed appears on the page.

To activate licenses and authorize users, do the following:

From the menu, click Resource > Resource. The Resource page appears.
On the Resource page, select the host. Click the Register icon. Then the Activate window appears.
In the Activate window, select the resource and click Submit.
After you activate the license, the Authorize window appears. From the User group list, select user groups to authorize access to the resource. Click Submit.

Note

If you are prompted with “No enough licenses”, contact the administrator to add licenses.
With many agents, install them first, then batch register, batch activate, and batch authorize the agents and resources for convenience. For details, see Batch register/Batch activate/Batch authorize in Aurreum Data Protection Suite Administrator’s Guide.

Backup#

Backup types#

ADPS provides full backup for Kubernetes.

Full backup

Backs up all the data on Kubernetes and enough logs to recover the data.

Backup policies#

ADPS provides six backup schedule types: immediate, one-time, hourly, daily, weekly, and monthly.

Immediate: ADPS will immediately start the job after it is created.
One-time: ADPS will perform the job at the specified time once only.
Hourly: ADPS will perform the job periodically at the specified hour/minute intervals within the time range according to the setting.
Daily: ADPS will perform the job periodically at the specified time and day intervals.
Weekly: ADPS will perform the job periodically at the specified time and week intervals.
Monthly: ADPS will perform the job periodically at the specified dates and times.

You can set an appropriate backup policy based on your situation and requirements. Usually, we recommend the following common backup policy:

Perform a full backup once a week when the application traffic is relatively small (Example: on the weekend) to ensure that you have a recoverable point in time every week.

Before you begin#

Before you back up and restore Kubernetes, check the following:

Check the resource status.

(1) Click Resource > Resource. The Resource page appears.

(2) Check whether the host and the Kubernetes resource are on the page with an Online state. If they are offline, check whether the agent service and the Kubernetes service are running.
Check storage pools.

(1) From the menu, click Storage > Storage pool. The Storage pool page appears.

(2) Check whether the display area has any storage pools. If no, create a storage pool and authorize it for the current user. For details, see Add a storage pool in Aurreum Data Protection Suite Administrator’s Guide.
Note

If you want to use LAN-free pools, do the following:
1. Run the modprobe iscsi_tcp command to load the kernel of all hosts. You can use the lsmod | grep iscsi_tcp command to check whether the loading is successful.
2. Run the /usr/sbin/iscsid command in the agent pod.

Check the Kubernetes cluster status.

Here is the Linux command to check the Kubernetes cluster status.

(1) Use the systemctl status kubelet command to check whether the kubelet service is active (running).

[root@k8s-master-106 ~]# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
  Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/kubelet.service.d
          └─10-kubeadm.conf
  Active: active (running) since Thu 2022-10-20 16:47:15 CST; 53min ago
    Docs: https://kubernetes.io/docs/
Main PID: 1169 (kubelet)
    Tasks: 29
  Memory: 161.6M
  CGroup: /system.slice/kubelet.service
          └─1169 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --network-plugin...

(2) Use the kubectl get nodes -owide command to check the version of the Kubernetes cluster and whether all nodes are Ready. Ensure that the versions of all nodes are not older than v1.17.0. Otherwise, the CSI driver is not supported.

[root@k8s-master-106 ~]# kubectl get nodes -owide
NAME             STATUS   ROLES    AGE   VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION                CONTAINER-RUNTIME
k8s-master-106   Ready    master   30d   v1.19.5   172.16.12.106   <none>        CentOS Linux 7 (Core)   5.4.219-1.el7.elrepo.x86_64   docker://19.3.11
k8s-node-107     Ready    <none>   30d   v1.19.5   172.16.12.107   <none>        CentOS Linux 7 (Core)   5.4.219-1.el7.elrepo.x86_64   docker://19.3.11
k8s-node-108     Ready    <none>   30d   v1.19.5   172.16.12.108   <none>        CentOS Linux 7 (Core)   5.4.219-1.el7.elrepo.x86_64   docker://19.3.11

(3) Use the kubectl get pod -A command to check whether all the pods in the Kubernetes cluster are Running.

[root@k8s-master-106 ~]# kubectl get pod -A
NAMESPACE      NAME                                         READY   STATUS      RESTARTS   AGE
backup         backup-agent-0                               1/1     Running     3          7h15m
kube-system    calico-kube-controllers-6c89d944d5-sgspg     1/1     Running     9          30d
kube-system    calico-node-442p9                            1/1     Running     9          30d
kube-system    calico-node-hqlx4                            1/1     Running     7          30d
kube-system    calico-node-srnvz                            1/1     Running     7          30d
kube-system    coredns-59c898cd69-bcf7r                     1/1     Running     9          30d
kube-system    coredns-59c898cd69-t97vc                     1/1     Running     10         30d
kube-system    etcd-k8s-master-106                          1/1     Running     9          30d
kube-system    kube-apiserver-k8s-master-106                1/1     Running     13         30d
kube-system    kube-controller-manager-k8s-master-106       1/1     Running     76         30d

(4) Use the kubectl get sc command to check whether the corresponding StorageClass of the CSI driver exists and whether the RECLAIMPOLICY of the StorageClass is Delete. If the RECLAIMPOLICY is not Delete, the temporary PV and volumes in Ceph created during backup cannot be deleted after the backup.

[root@k8s-master-106 ~]# kubectl get sc
NAME         PROVISIONER        RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
csi-rbd-sc   rbd.csi.ceph.com   Delete          Immediate           true                   30d

(5) Use the ceph -s command to check whether the Ceph cluster status is normal and ensure that:

the Ceph cluster has enough space
the Ceph version is later than v14.0
the kernel version is later than or equal to 5.1

[root@k8s-master-106 ~]# ceph -s
  cluster:
  id:     948d9908-dd20-4866-beea-e798e82f0252
  health: HEALTH_OK

services:
  mon: 1 daemons, quorum k8s-master-106 (age 24h)
  mgr: k8s-master-106(active, since 24h)
  osd: 3 osds: 3 up (since 24h), 3 in (since 5d)

data:
  pools:   1 pools, 128 pgs
  objects: 825 objects, 2.3 GiB
  usage:   9.8 GiB used, 590 GiB / 600 GiB avail
  pgs:     128 active+clean

io:
  client:   62 KiB/s wr, 0 op/s rd, 4 op/s wr

Note

Since Kubernetes v1.20, Kubernetes removes the metadata.selfLink by default but some applications still use it, such as nfs-client-provisioner. To use these applications, enable metadata.selfLink. Modify the /etc/kubernetes/manifests/kube-apiserver.yaml file, add a line - --feature-gates=RemoveSelfLink=false in the start parameter, and run kubectl apply -f kube-apiserver.yaml.
CSI reference: ceph-csi.git
Snapshot reference: external-snapshotter.git

Log in to the resource#

Before you create a backup or restore job, log in to the Kubernetes instance and authenticate the identity. You can use the access key of the current ADPS user to log in to the resource. This method is suitable for the following scenarios:

You cannot get the OS user’s username and password.
The user’s password changes frequently.

Note

Access key authentication is not enabled by default. To enable this feature, log in to the ADPS console, go to Settings, open the Security tab, and select the Access key login instance checkbox.
To get the access key, log in to the console, click Personal settings > Account settings on the upper right corner, find Access key on the Preferences tab, and click View.

To log in to the resource, do the following:

From the menu, click Resource > Resource. The Resource page appears.
From the host list, find the host where the K8s resource resides. If you have many hosts, use the search bar to find the host quickly. Click the host to expand its resource list.
Click Login beside the resource. The Login window appears.
In the Login window, enter the access key of the current ADPS user, and click Login.
If your information is correct, you will be prompted that you have logged in to the resource successfully.

Create a backup job#

To create a backup job, do the following:

From the menu, click Backup. The backup job wizard appears.
At the Hosts and resources step, select the host and the Kubernetes resource. The wizard goes to the next step automatically.
At the Backup source step, select Full backup. Click Next.
At the Backup target step, select a storage pool. Click Next.
At the Backup schedule step, set the job schedule. For details, see Backup policies. Click Next.
- Select Immediate. ADPS performs the job immediately after it is created.
- Select One time and set the start time for the job.
- Select Hourly. Set the start time, end time, and time interval for job execution. The unit can be hour(s) or minute(s).
- Select Daily. Set the start time and enter the time interval for job execution. The unit is day(s).
- Select Weekly. Set the start time, enter the time interval, and select the specific dates in a week for job execution. The unit is week.
- Select Monthly. Set the start time and months for job execution. You can select the natural dates in one month or select the specific dates in one week.
At the Backup options step, set the common and advanced options according to your needs. For details, see Backup options. Click Next.
At the Finish step, set the job name and confirm the job information. Click Submit.
After the submission, you will be redirected to the Job page automatically. On this page, you can start, modify, and delete the job.

Backup options#

ADPS provides the following backup options:

Common options

Backup common options#
Option	Description
Compression	Fast is enabled by default. Backup data is compressed at the source side for transmission. It can reduce the backup time, improve backup efficiency, and save backup space.
Channels	It can improve backup efficiency. The default value is 1 and the value ranges from 1 to 255. We recommend a value the same as the number of CPU cores. If the value exceeds the core number, the efficiency improvement will not be obvious.

Advanced options

Backup advanced options#
Option	Description
Reconnection time	The value ranges from 1 to 60 minutes. The job continues after the abnormal reset occurs in the network within the set time.
Speed limit	Limits data transfer speed or disk read/write speed for different time periods. The unit can be KiB/s, MiB/s, and GiB/s.
Precondition	Checked before the job starts. The job execution will be aborted and the job state will be idle when the precondition is invalid.
Pre-/Post-script	The pre-script is executed after the job starts and before the resource is backed up. The post-script is executed after the resource is backed up.

Restore#

ADPS provides point-in-time restore for Kubernetes.

Point-in-time restore

Restores Kubernetes to a specified point in time. The restore target can be the original host or a different host.

Before you begin#

To restore the instance to a different host, install the agent on that Kubernetes cluster, activate the licenses, and authorize user access to the resource.

Create a point-in-time restore job#

To create a point-in-time restore job, do the following:

From the menu, click Restore. The restore job wizard appears.
At the Hosts and resources step, select the host and the Kubernetes resource. The wizard goes to the next step automatically.
At the Backup sets step, do the following:

(1) From the Restore type list, select Point-in-time restore.

(2) In the Restore source section, select a backup set for the restore job.

(3) Click Next.
At the Restore target step, select a host and resource as the target. The wizard goes to the next step automatically.
At the Restore schedule step, set the job schedule. Click Next.
- Select Immediate. ADPS will perform the job immediately after its creation.
- Select One time and set the start time for the job.
At the Restore options step, set the options according to your needs. See Restore options. Click Next.
At the Finish step, set the job name and confirm the job information. Click Submit.
After the submission, you will be redirected to the Job page. You can start, modify, and delete the job.

Note

During restore, the data may be restored successfully while the job fails because the related configuration resources of the original service have conflicts. In this case, you need to create the service resources and modify the IP and port manually.

Restore options#

ADPS provides the following restore options:

Advanced options

Restore advanced options#
Option	Description
Reconnection time	The value ranges from 1 to 60 minutes. The job continues after the abnormal reset occurs in the network within the set time.
Resumption buffer size	Specifies the resumption buffer size. The default value is 10 MiB. The bigger the resumption buffer size is, the more physical storage will be consumed. However, a bigger resumption buffer size can prevent data loss when the throughput of the business system is high.
Speed limit	Limits data transfer speed or disk read/write speed for different time periods. The unit can be KiB/s, MiB/s, and GiB/s.
Precondition	Checked before the job starts. The job execution will be aborted and the job state will be idle when the precondition is invalid.
Pre-/Post-script	The pre-script is executed after the job starts and before the resource is restored. The post-script is executed after the resource is restored.

Limitations#

Limitations#
Feature	Limitations
Backup	1. Cannot back up volumes mounted on block devices. 2. Can only back up workloads using Ceph-CSI/CephFS-CSI persistent volumes. 3. Cannot back up a whole cluster. 4. Do not support the NFS file system.
Environment	1. The kernel of the Ceph cluster must be later than 5.1. 2. Kubernetes version cannot be older than v1.17.0. 3. Ceph version must be later than v14.0. 4. Do not support IPv6 dual-stack networking. 5. LAN-free pools using iSCSI are supported.
Restore	1. Cannot restore the Ceph-CSI driver. 2. Cannot restore agent-related pods.

Glossary#

Glossary#
Term	Description
CustomResourceDefinition	Abbreviated as CRD. A claim for user to define the resource object.
namespace	Kubernetes (K8s) supports multiple virtual clusters which rely on the same physical cluster. These virtual clusters are called namespaces. By allocating resources to different namespaces, resources can be logically isolated and more easily managed for different uses. Resources in the same namespace must have unique names, while resources in different namespaces can share the same name.
Master	A node that is the main control component of K8s.
Node	A component in a K8s cluster which can be a physical or virtual machine. K8S
Pod	The smallest deployable units of computing that users can create and manage in Kubernetes. A Pod is a group of one or more containers.
Service	A method for exposing a network application that is running as one or more Pods in a cluster.
NodePort	A network service that is used to expose the application to the internet.
StorageClass	Provides a way for administrators to describe the classes of storage they offer.
VolumeSnapshotClass	Provides a way to describe the “classes” of storage when provisioning a volume snapshot.
VolumeSnapshotContent	A snapshot taken from a volume in the cluster that has been provisioned by an administrator. It is a resource in the cluster just like a PersistentVolume is a cluster resource.
VolumeSnapshot	A request for snapshot of a volume by a user. It is similar to a PersistentVolumeClaim.
PersistentVolume	Abbreviated as PV. A piece of storage in the cluster that has been provisioned by an administrator or dynamically provisioned using Storage Classes.
PersistentVolumeClaim	Abbreviated as PVC. A request for storage by a user.
Secret	An object that contains a small amount of sensitive data such as a password, a token, or a key.
ConfigMap	An API object used to store non-confidential data in key-value pairs
ServiceAccount	Provides an identity for processes that run in a Pod, and maps to a ServiceAccount object.
LimitRange	A policy to constrain the resource allocations (limits and requests) that you can specify for each applicable object kind (such as Pod or PersistentVolumeClaim) in a namespace.
CSI	Container Storage Interface. An industry standard interface rule created by community members including Kubernetes, Mesos, and Docker. CSI is a standard for exposing arbitrary block and file storage systems to containerized workloads on Container Orchestration Systems (COs).
Clone	Creates a copy of the existing Kubernetes volumes. The copy can be used as any other standard volume. Clone only supports CSI driver.
ReplicationController	Abbreviated as RC. It can create and manage a specified number of pod replicas.
ReplicaSet	Abbreviated as RS. It maintains a stable set of replica Pods running at any given time and can replace RC.
Deployment	Can be regarded as a superset of RC. In addition to providing functions such as Pod management, it also provides new features such as rollback and version recording. Generally, we do not create RC/RS directly, but create higher-level Deployment resources to automatically create RC/RS.
ApiServer	Provides HTTP Rest interfaces for creating, deleting, reading, updating, and monitoring various K8S resource objects (Pod, RC, Service, and so on), and is the data bus and data center of the entire system.ApiServer
metadata	This document uses metadata to define the resource data other than persistent volume storage data in namespaces.
DaemonSet	Ensures that all (or some) Nodes run a copy of a Pod. As nodes are added to the cluster, Pods are added to them. As nodes are removed from the cluster, those Pods are garbage collected. Deleting a DaemonSet will clean up the Pods it created.
StatefulSet	Manages the deployment and scaling of a set of Pods and provides guarantees about the ordering and uniqueness of these Pods.
kubectl	The command line tool (CLI) of Kubernetes. It is the required management tool for Kubernetes users and administrators.

User Guide for Kubernetes

Contents

User Guide for Kubernetes#

Overview#

Planning and preparation#

Install and configure the agent#

Verify the compatibility#

Install the agent#

Activate licenses and authorize users#

Backup#

Backup types#

Backup policies#

Before you begin#

Log in to the resource#

Create a backup job#

Backup options#

Restore#

Before you begin#

Create a point-in-time restore job#

Restore options#

Limitations#

Glossary#