Backup and Restore
Backup and Restore are integrated features provided by tablets managed by Vitess. As well as using backups for data integrity, Vitess will also create and restore backups for provisioning new tablets in an existing shard.
Concepts #
Vitess supports pluggable interfaces for both Backup Storage Services and Backup Engines.
Before backing up or restoring a tablet, you need to ensure that the tablet is aware of the Backup Storage system and Backup engine that you are using. To do so, use the following command-line flags when starting a vttablet or vtctld that has access to the location where you are storing backups.
Backup Storage Services #
Currently, Vitess has plugins for:
- File (using a path on shared storage, e.g. an NFS mount)
- Google Cloud Storage
- Amazon S3
- Ceph
Backup Engines #
The engine is the techology used for generating the backup. Currently Vitess has plugins for:
- Builtin: Shutdown an instance and copy all the database files (default)
- XtraBackup: An online backup using Percona's XtraBackup
VTTablet and Vtctld configuration #
The following options can be used to configure VTTablet and Vtctld for backups:
Flags | |
---|---|
backup_storage_implementation | Specifies the implementation of the Backup Storage interface to
use. Current plugin options available are:
|
backup_engine_implementation | Specifies the implementation of the Backup Engine to
use. Current options available are:
|
backup_storage_hook | If set, the content of every file to backup is sent to a hook. The
hook receives the data for each file on stdin. It should echo the
transformed data to stdout. Anything the hook prints to stderr will
be printed in the vttablet logs. Hooks should be located in the vthook subdirectory of the
VTROOT directory.The hook receives a -operation write or a
-operation read parameter depending on the direction
of the data processing. For instance, write would be for
encryption, and read would be for decryption. |
backup_storage_compress | This flag controls if the backups are compressed by the Vitess code.
By default it is set to true. Use
--backup_storage_compress=false to disable.This is meant to be used with a --backup_storage_hook
hook that already compresses the data, to avoid compressing the data
twice. | compression-level | Select what is the compression level (from `1..9`) to be used with the builtin compressors.
It doesn't have any effect if you are using an external compressor. Defaults to
1 (fastest compression). |
compression-engine-name | This indicates which compression engine to use. The default value is pargzip .
If using an external compressor (see below), this should be a compatible compression engine as the
value will be saved to the MANIFEST when creating the backup and can be used to decompress it. |
external-compressor | Instead of compressing inside the vttablet process, use the external command to
compress the input. The compressed stream needs to be written to STDOUT .An example command to compress with an external compressor using the fastest mode and lowest CPU priority:--external-compressor "nice -n 19 pigz -1 -c" If the backup is supported by one of the builtin engines, make sure to use --compression-engine-name
so it can be restored without requiring --external-decompressor to be defined. |
external-compressor-extension | Using the --external-compressor-extension flag will set the correct extension when
writing the file. Only used for the xtrabackupengine .Example: --external-compressor-extension ".gz" |
external-decompressor | Use an external decompressor to process the backups. This overrides the builtin
decompressor which would be automatically select the best engine based on the MANIFEST information.
The decompressed stream needs to be written to STDOUT .An example of how to use an external decompressor:--external-decompressor "pigz -d -c" |
file_backup_storage_root | For the file plugin, this identifies the root directory
for backups. This path must exist on shared storage to provide a global backup view for all vtctlds and vttablets. |
gcs_backup_storage_bucket | For the gcs plugin, this identifies the
bucket
to use. |
s3_backup_aws_region | For the s3 plugin, this identifies the AWS region. |
s3_backup_storage_bucket | For the s3 plugin, this identifies the AWS S3
bucket. |
ceph_backup_storage_config | For the ceph plugin, this identifies the path to a text
file with a JSON object as configuration. The JSON object requires the
following keys: accessKey , secretKey ,
endPoint and useSSL . Bucket name is computed
from keyspace name and shard name is separated for different
keyspaces / shards. |
restart_before_backup | If set, perform a clean MySQL shutdown and startup cycle. Note this is not executing any `FLUSH` statements. This enables users to work around xtrabackup DDL issues. |
restore_from_backup | Indicates that, when started with an empty MySQL instance, the tablet should restore the most recent backup from the specified storage plugin. |
restore_from_backup_ts | If set, restore the latest backup taken at or before this timestamp rather than using the most recent one. Example: β2021-04-29.133050β. (Vitess 12.0+) |
xbstream_restore_flags | The flags to pass to the xbstream command during restore. These should be space separated and will be added to the end of the command. These need to match the ones used for backup e.g. --compress / --decompress , --encrypt / --decrypt |
xtrabackup_root_path | For the xtrabackup backup engine, directory location of the xtrabackup executable, e.g., `/usr/bin` |
xtrabackup_backup_flags | For the xtrabackup backup engine, flags to pass to the backup command. These should be space separated and will be added to the end of the command. |
xtrabackup_stream_mode | For the xtrabackup backup engine, which mode to use if streaming, valid values are tar and xbstream . Defaults to tar . |
xtrabackup_user | For the xtrabackup backup engine, required user that xtrabackup will use to connect to the database server. This user must have all necessary privileges. For details, please refer to xtrabackup documentation. |
xtrabackup_stripes | For the xtrabackup backup engine, if greater than 0, use data striping across this many destination files to parallelize data transfer and decompression. |
xtrabackup_stripe_block_size | For the xtrabackup backup engine, size in bytes of each block that gets sent to a given stripe before rotating to the next stripe. Defaults to 102400 . |
xtrabackup_prepare_flags | Flags to pass to the prepare command. These should be space separated and will be added to the end of the command. |
Authentication #
Note that for the Google Cloud Storage plugin, we currently only support Application Default Credentials. This means that access to Google Cloud Storage (GCS) is automatically granted by virtue of the fact that you're already running within Google Compute Engine (GCE) or Google Kubernetes Engine (GKE).
For this to work, the GCE instances must have been created with the scope that grants read-write access to GCS. When using GKE, you can do this for all the instances it creates by adding --scopes storage-rw
to the gcloud container clusters create
command.
Backup Frequency #
We recommend to take backups regularly -- e.g. you should set up a cron job for it.
To determine the proper frequency for creating backups, consider the amount of time that you keep replication logs (see the binlog_expire_logs variables) and allow enough time to investigate and fix problems in the event that a backup operation fails.
For example, suppose you typically keep four days of replication logs and you create daily backups. In that case, even if a backup fails, you have at least a couple of days from the time of the failure to investigate and fix the problem.
Concurrency #
The backup and restore processes simultaneously copy and either compress or decompress multiple files to increase throughput. You can control the concurrency using command-line flags:
- The vtctl Backup command uses the
--concurrency
flag. - vttablet uses the
--restore_concurrency
flag.
If the network link is fast enough, the concurrency matches the CPU usage of the process during the backup or restore process.
Backup Compression #
By default, vttablet
backups are compressed using pargzip
that generates gzip
compatible files.
You can select other builtin engines that are supported, or choose to use an external process to do the
compression/decompression for you. There are some advantages of doing this, like being able to set the
scheduling priority or even to choose dedicated CPU cores to do the compression, things that are not possible when running inside the vttablet
process.
The built-in supported engines are:
Compression:
pargzip
(default)pgzip
lz4
zstd
Decompression:
pgzip
lz4
zstd
To change which compression engine to use, you can use the --compression-engine-name
flag. The compression
engine will also be saved to the backup manifest, which is read during the decompression process to select
the right engine to decompress (so even if it gets changed, the vttablet
will still be able to restore
previous backups).
If you want to use an external compressor/decompressor, you can do this by setting:
--external-compressor
with the command that will actually compress the stream;--external-compressor-extension
(only if using xtrabackupengine): this will let you use the extension of the file saved--compression-engine-name
with the compatible engine that can decompress it. Useexternal
if you are using an external engine not included in the above supported list. This value will be saved to the backup MANIFEST; If it is not added (or engine isexternal
), backups won't be able to restore unless you pass the parameter below:--external-decompressor
with the command used to decompress the files;
The vttablet
process will launch the external process and pass the input stream via STDIN and expects
the process will write the compressed/decompressed stream to STDOUT.
If you are using an external compressor and want to move to a builtin engine:
- If the engine is supported according to the list above, you just need to make sure your
--compression-engine-name
is correct and you can remove the--external-compressor
parameter - If you want to move away from an unsupported engine to a builtin one, then you have to:
- First change the
--compression-engine-name
to a supported one and remove the--external-compressor
- Once the first backup is completed, you can then remove
--external-decompressor
- After this all new backups will be done using the new engine. Restoring an older backup will still require the
--external-decompressor
flag to be provided
- First change the