Categories
Uncategorised

GitHub Enterprise Backup: Why It’s Important

Running GitHub Enterprise backups is crucial to streamline your DevOps processes and secure your repositories.

What’s more, without securing your GitHub backups, you run the risk of losing your critical codes and data. 

This is where GitHub Enterprise Backup Utilities come in handy. 

GitHub Enterprise Backup Utilities is a full GitHub Enterprise Server backup and recovery system through two utilities: ghe-backup and ghe-restore. 

It is a set of tools that takes periodic and consistent application-aware snapshots of GitHub Enterprise Server instances over a Secure Shell (SSH) connection.  

If you’re still on the fence about running GitHub Enterprise Backups, continue reading to find out why it’s important to help secure your repository data.

Must read: Top GitHub Backup Tools

1. Lets you take snapshots of repository data

GitHub Enterprise Backup Utilities lets you take regular snapshots of your data and restore them easily.

You can use the following commands after running an initial backup:

  • ghe-backup takes incremental snapshots of your repository data, including full snapshots of all your other important data stores. 
  • ghe-restore restores the snapshots to the same (or separate) GitHub Enterprise appliance. You’ll need to include the backup host’s SSH key in the target GitHub Enterprise Server appliance before you can use this command. 

The commands run on the host where you installed Backup Utilities.

Running the commands allows you to protect and keep track of your repository data. It can help you act promptly and accordingly if your data gets corrupted, compromised, or lost.  

Set up the backup and restore behavior using your own configuration file. 

You can also use the sample configuration file (backup.config-example) as a template to set up your environment for backing up and restoring. 

Other essential considerations when running the backup and restore commands include the following. 

  • You can pass several command line options to the ghe-restore command. If you use an external MySQL service but want to restore from a snapshot before enabling the command (or vice versa), you need to migrate the MySQL data outside the backup-utils’ context. Then, pass the –skip-mysql flag to ghe-restore. 
  • Restoring to a new GitHub Enterprise Server instance restores the license, settings, and certificate data. 

However, you must review and save the settings before using the GitHub Enterprise Server to ensure all migrations happen and all the required services start. 

If you restore to an already configured GitHub Enterprise Server instance, license, certificate, and settings data won’t get restored. It helps avoid overwriting manual configuration on the restore host. 

  • Back up the GitHub Actions data separately on your external storage provider since it’s not included in regular GitHub Enterprise Server backups. 

2. Allows you to schedule regular backups

Running regular GitHub Enterprise backups is crucial to protecting and preserving your business-critical codes and data. 

The challenge is that manually implementing backups can take too much time and effort. It also opens your data to risks, such as: 

  • Data loss because of rogue employees or simply human error
  • Server crashes
  • Unplanned downtimes 
  • Accidental repository and data deletions
  • Data breaches due to phishing, malware, ransomware attacks, and hacking

A more efficient and effective approach is to schedule routine GitHub Enterprise backups to run them automatically. 

Schedule regular backups using cron(8) or similar command scheduling services on the backup host. 

The backup frequency dictates your backup plan’s worst-case Recovery Point Objective (RPO). 

After installing Backup Utilities in /opt/backup-utils, ensure the crontab entry is under the same user manual backup/recovery commands are issued. The crontab must also have write access to the already set up GHE_DATA_DIR directory.

Important note: The GHE_NUM_SNAPSHOTS option within backup.config must be tuned based on the backup frequency. The last ten snapshots get retained by default, and you can adjust the number based on the available storage and backup frequency. 

Below are some examples of scheduling backups with the GitHub Enterprise Backup Utilities.

  • Schedule hourly backup snapshots using verbose informational output written to a log file and errors that create an email. 

Use this sample command:

[email protected]

0 * * * * /opt/backup-utils/bin/ghe-backup -v 1>>/opt/backup-utils/backup.log 2>&1

  • Schedule nightly backup snapshots using this command:

[email protected]

0 0 * * * /opt/backup-utils/bin/ghe-backup -v 1>>/opt/backup-utils/backup.log 2>&1

Running regular GitHub Enterprise backups helps you monitor your repository data. It can also reduce or eliminate risks that lead to data loss and corruption. 

3. Provides a backup snapshot file structure

You could have a library of repository data snapshots, but finding the backup snapshots you want can be challenging without an existing file structure. 

GitHub Enterprise Backup Utilities provides a solution. 

Backup Utilities store snapshots in rotating increment directories that are named after the time and date the snapshots were taken.    

Every snapshot directory has a complete backup snapshot of all essential data stores. Search, Pages, and Repository data are stored seamlessly through hard links. 

Important note: You’ll need to maintain symlinks when archiving backup snapshots. Excluding or dereferencing symlinks or storing snapshot contents on filesystems that don’t support symlinks can lead to operational issues when restoring data. 

GitHub Enterprise Backup Utilities also provide an MS SQL Server backup structure. 

Actions service utilizes MS SQL Server as a backend data store. Every snapshot includes a suite of backup files for MS SQL Server databases.  

Streamline your backup by implementing a three-level backup strategy. 

You can do this via the GHE_MSSQL_BACKUP_CADENCE that takes a log backup at each snapshot—either (D)ifferential, (F)ull backup, or a (T)ransaction. 

Hard links are created that point to previous backup files at each snapshot, saving disk space. 

Also, only newly created backup files get transferred from appliance to backup host. 

Newly created full or differential backups become the new hard links source and baseline for transaction log backups and subsequent snapshots.  

A suite of backup files gets restored (during restore) within the sequence of full -> differential -> chronological transaction log. 

Must read: How to Choose a GitHub Backup Service

Run a regular GitHub Enterprise Backup 

Protect your GitHub Enterprise repository data from security risks and keep your assets intact by running regular GitHub Enterprise backups. 

Leverage the GitHub Enterprise Backup Utilities and incorporate the set of tools into your backup strategy. It helps you mitigate and prevent security issues that lead to costly data loss and DevOps process interruptions.