Postgres backup

From Shell Scripting to Kubernetes - Postgres Backups With pg_dump [Part 1]

There are different ways to do Kubernetes database backups. Why use pg_dump? There are two benefits for this option: simplicity and consistency. It comes with the standard PostgreSQL distribution and it continues to make backups even if the database is being used at the same time, it does not block other users accessing the database. In this tutorial, I will explain how to create PostgreSQL backups using this command.

Why

While we are using Velero (https://velero.io) to do Kubernetes backups, I felt a bit uneasy about the database backups so I decided to also have some good old SQL dumps for our Postgres databases. In VMs we have used this type of backup for quite some time, however, for those running in Kubernetes we didn’t.

If you want to understand how this has been created and brush up your Kubernetes knowledge, read on.

The pgdump.sh Script

This is a simple shell script based on pg_dump which is creating a SQL dump for a database. Even if in Kubernetes it’s not as important, in order to be consistent with the style we are using in our VMs, we will dump each database in its own file. This means we are not using pg_dumpall but we will loop over the list of dbs and dump each one with pg_dump.

A few notes on how this may be different from other scripts found online:

The Docker Image

The Dockerfile for building the docker image is very simple, basically we start from the bitnami/postgres image which already has all the postgresql tools and add the pgdump.sh script.

Setting a 'do nothing' ENTRYPOINT is useful in this case because you can start a container or a pod from it, exec 'inside' it and test things.

Building and pushing (to docker hub vvang repository) the image steps:

docker build -t pgdump:0.5 .

docker tag pgdump:0.5 vvang/pgdump:0.5

docker push vvang/pgdump:0.5

Preparing the Kubernetes Test Environment

I have tested this on KIND (local cluster) and DigitalOcean managed Kubernetes so I'll just assume you have access with kubectl and helm to a Kubernetes cluster.

I have created a namespace pg-helm and I used helm to install postgres in there:

kubectl create ns pg-helm

helm install -n pg-helm --set auth.postgresPassword='p123456' mypg bitnami/postgresql

As you can see in the instructions displayed by the helm command, you can access the postgres server using the service mypg-postgresql ( mypg-postgresql.pg-helm from other namespaces), with user postgres and the password stored in the secret mypg-postgresql:

kubectl -n pg-helm get services

kubectl -n pg-helm get secrets

kubectl get secret --namespace pg-helm mypg-postgresql -o jsonpath="{.data.postgres-password}" | base64 -d

If you uninstall the helm chart, the PVC won't be deleted, as a safety measure for you. If you reinstall it, the old PVC and postgres data will be used.

Kubernetes Basic YAMLs

I have decided to use dedicated volumes for dumps so first we'll need to create a PVC, see the file pvc.yml.

Then, we'll create a deployment (in fact, a simple pod will suffice) which can be used to test the dump script or to do restores. Looking at the file deployment.yml: - all the objects created (pvc, deployment, cronjob) will have the same name, in this case mypg-pgdump - using the volume created by PVC.

  • the init container is used to fix some owner/permissions on the directory where the volume is mounted. this is necessary since the container is not running as root but as user 1001 (due to the inherited Docker image). On KIND, this directory has 777 permissions but on other clusters, after mount, the directory is owned by root with 755 mode.
  • the main container is using environment variables to set everything and for the postgres password is referencing the secret created by the helm above.

After you create the PVC and the deployment, you can test things:

kubectl apply -f pvc.yaml

kubectl apply -f deployment.yml

kubectl -n pg-helm get pods

# note the mypg-pgump-... name above and use it in the next command

kubectl -n pg-helm exec -ti mypg-pgdump-6cdfc4c966-kq6j2 -- bash

# now you are 'inside' the container

/pgdump.sh     # this will run the backup

ls -la /data   # this will show the dumps

exit

If this is working, everything is ok and you can proceed with the cronjob. Otherwise, you can use various debug commands inside the mypg-pgdump pod.

The Cronjob

Kubernetes has a dedicated object for cronjobs and this is pretty similar to a Deployment. Look at the file cronjob.yml where in the spec section you will recognize most of it. Only the schedule and restartPolicy are new.

The schedule syntax is the same as the standard Unix cron daemon (no surprise here). You may change the schedule: line to see some results faster.

kubectl apply -f cronjob.yml

kubectl get cj

And that’s it. Stay tuned for part 2, where we will discuss Helm and how to convert static YAML files.

You may find all the files here: https://github.com/viorel-anghel/pgdump-kubernetes.git

About the author

viorel anghel esolutions

Viorel Anghel has 20+ years of experience as an IT Professional, taking on various roles, such as Systems Architect, Sysadmin, Network Engineer, SRE, Devops, and Tech Lead. He has a background in Unix/Linux systems administration, high availability, scalability, change, and config management. Also, Viorel is a RedHat Certified Engineer and AWS Certified Solutions Architect, working with Docker, Kubernetes, Xen, AWS, GCP, Cassandra, Kafka, and many other technologies. He is the Head of Cloud and Infrastructure at eSolutions.

Take the first step towards improving your business with custom software services

Ready to learn more about how custom software products can benefit your business? Contact us today to schedule a consultation and start exploring your options!