Backups and cloud computing

Regardless of the cloud data provider you use, you will need a cloud data backup solution that will automatically do backups for your cloud environment to protect your apps against data loss. You will want to choose a backup strategy that can guarantee business continuity in case of unexpected failures and a solid disaster recovery procedure for restoring data. When devising a disaster recovery plan, you will need to first consider the tools that your cloud service provider offers, because those tools may have better integration on that platform and may obviate the need for third party backup software. Cloud backups and disaster recovery don’t have to imply spending large amount of cash on backup software. Also, the amount of data you need to secure may make a difference when making a choice between remote backup solutions or using the platforms own cloud storage providers.  When dealing with a large volume of data, a storage system that is physically colocated with your app might provide a cheaper backup option, because you save a network transmission fees.

Therefore, when you create a cloud native application and you are looking for backup solutions for data recovery, one obvious place to look is using the cloud service provider’s own tools. Besides potentially lowering your data transport fees if you locate your cloud backups close to where the data is, you will only have to learn one user interface or command language, which simplifies the learning curve in deploying a cloud native app. If you only seek simple file syncing-based cloud-to-cloud backup to provide disaster recovery for your cloud storage, you do not necessarily need to spend money on any advanced backup software.

Then if you want to further simplify the backup process and want a turn key site recovery solution, then you can explore third party packages for backups and recovery of your entire data and applications or specific files. 

If you are on Google Cloud

Google cloud provides a mechanism to snapshot the disks automatically, which can be used as a form of online backup . You get a nice user interface which allows you to set up schedules to take snapshots. This will automatically create new snapshots and also delete old ones so that you can maintain a list of snapshots for a set time period in the past.

The snapshot UI

You can find the snapshot UI at https://console.cloud.google.com/compute/snapshots. Click on the ‘CREATE SNAPSHOT SCHEDULE’ link in the middle at the top. This will allow you specify the schedule name, which will be used when you assign it to a disk later, a region which will have ramifications for data transport costs , the frequency the snapshots should be taken, and a deletion rule. For instance you can say things like, take daily snapshots and delete those older than one week. Pick multi-regional or from the same region to keep costs down!

Assign the schedule to a disk

Now, that you have a schedule, you can assign it to any number of disks by using its name in the following screen https://console.cloud.google.com/compute/disks. Click on the name of the disk that you want to generate the snapshots for and in the screen that comes up click edit and then set the snapshot schedule from the pick list. The schedule that you created earlier should show up in that pick list. The snapshots will bear the name of the disk and the date and time they were taken, plus some auto-generated characters.

Setting up the automatic snapshots from the command line

The following set of commands will create the snapshot schedule and will attach it to a disk from the command line, so that you can do this from a script.

gcloud compute resource-policies create snapshot-schedule \
<schedule name> \
    --description="Daily schedule"  \
    --start-time=05:00 \
    --daily-schedule \
    --max-retention-days=7 \
    --region=us-central1

gcloud compute disks add-resource-policies <disk-name> \
    --resource-policies=<schedule name> \
    --zone=us-central1-a

Create a pod using the snapshot

Following the above steps you should always have a series of backups for your cloud application. But now you have to bring up your Kubernetes pod using a snapshot.

There are theoretically multiple ways of doing this. Ideally, when working with Kubernetes, you should be using its own commands. There are Kubernetes annotations that are supposed to work, but I have not been able to get them to work in released versions of the GKE API. They all seem to rely on at least a few alpha APIs, which means that we are not supposed use them in production. They are not only very hard to enable, but your pod would automatically get deleted after 30 days according to the Google policy for clusters using alpha APIs. That’s not the kind of surprise most of us would want to have in production.

Create a disk from a snapshot using Google’s proprietary commands

The following works on Google reliably. It is using gcloud commands, so they won’t work if you migrate your cluster onto another cloud platform, but at least you can count on this working on GKE.

gcloud config set project <you project id>
gcloud config set compute/zone us-central1-a

gcloud container clusters get-credentials <you cluster name>

gcloud compute disks create <persistent disk name> \
    --size 200Gi \
    --source-snapshot <the source snapshot name> \
    --type pd-standard \
    --zone us-central1-a

Make sure you don’t reuse the same disk name when you create and delete disks. What will happen is that the old disk will not yet have been deleted and the creation of the old disk will be pending forever. Its a good idea to do what Google itself does for many resources it creates: Attach a random sequence of bytes to each of your disk names. Then you can create and delete them as you please!

If you always want to create a disk from the most recent snapshot, then you can use the following gcloud command. To be safe we’re filtering snapshots that start with the disk name, so that we only get the snapshots for that disk. Here’s how we can accomplish getting the name of the latest snapshot for a disk and store it in a variable so that we can use it when we create the disk:

MOSTRECENT_SNAPSHOT_NAME=`gcloud compute snapshots list \
    --filter="name ~ ^mydisknameprefix" \
    --sort-by=~creationTimestamp | sed 1d | cut -d ' ' -f 1 | head -n 1`

Don’t keep disks around. Use snapshots instead!

The technique above makes it possible to always start up a pod with a blank disk that is initialized from a snapshot. This obviates the need to keep disks around. Unfortunately, you cannot directly look at the snapshots though. So you cannot do things like look at the file system in the snapshot to see if a critical file was backed up. You have to create a disk from a snapshot first, and in order to do that you have to start up a pod. I set up a system where all disks are always initialized from snapshots. This allows me a great deal of control if I want to start up my pods at certain recovery points. For example I can simply redeploy my pods pointing to older snapshots, which puts me back to that point in time, without any additional complex backup and restore procedures. Plus, it works pretty fast!

Now create the pod

In Kubernetes, we provision pods with storage by creating persistent volumes and volume claims. You can simply create a volume claim and a volume will be created for you automatically. But if you want your volumes to be initialized from snapshots on Google Cloud, your best bet for now is to take these steps manually. We have already created a disk. Here’s how we can tell Kubernetes how to use that disk. Put this into a file called claim.yaml and issue kubectl apply -f claim.yaml.

apiVersion: v1
kind: PersistentVolume
metadata:
  name: <persistent volume name>
spec:
  storageClassName: ""
  capacity:
    storage: 200Gi
  accessModes:
  - ReadWriteOnce
  gcePersistentDisk:
    pdName: <persistent disk name>
    fsType: ext4
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: <persistent volume claim name>
spec:
  storageClassName: ""
  volumeName: <persistent volume name>
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 200Gi

So we use the disk created in the previous step and create a Kubernetes Persistent Volume out it. Then we create a persistent volume claim which then can be use to configure pods with. Note that the 200Gi size seems to be a magic number for Google. If you provision a smaller disk, Google might do it but it will generate warning messages saying that such “small” disks cannot be treated efficiently. So always setup 200Gi disks at a minimum.

In the final step, we can create a pod deployment from it. Put the code below into a file called pod.yaml and issue the command kubectl create -f pod.yaml.

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: <pod name>
  labels:
    app: <pod name>
spec:
  replicas: 1
  selector:
    matchLabels:
      app: <pod name>
  template:
    metadata:
      labels:
        app: <pod name>
    spec:
      containers:
      - image: <docker image name>
        name: <container name>
        volumeMounts:
        - name: <volume name>
          mountPath: /var/<your mount point>
      volumes:
      - name: <volume name>
        persistentVolumeClaim:
          claimName: <persistent volume claim name>
    


As you can see, we have the following steps in total:

  • Create the snapshot
  • Create the disk from the snapshot
  • Create the persistent volume from the disk
  • Create the persistent volume claim from the persistent volume
  • Create a volume for the pod from the claim

And voila, you have a volume for your pod! Hopefully, this process will get a little easier as the tools mature. As I mentioned earlier there are better APIs but they are not mature yet.

Using snapshots for backups works great for testing

If you have a mechanism in place which can start up your system from a point in time using snapshots, you can easily create a test environment by simply bringing up a second set of pods from the same snapshots for testing. It makes setting up integration testing staging environments really easy.