VTOrc

VTOrc is the automated fault detection and repair tool of Vitess. It started off as a fork of the Orchestrator, which was then custom-fitted to the Vitess use-case running as a Vitess component. An overview of the architecture of VTOrc can be found on this page.

Setting up VTOrc lets you avoid performing the InitShardPrimary step. It automatically detects that the new shard doesn't have a primary and elects one for you. It detects any configuration problems in the cluster and fixes them. Here is the list of things VTOrc can do for you:

Recovery NameDescriptionFix that VTOrc does
ClusterHasNoPrimaryVTOrc detects when a shard doesn't have any primary tablet electedVTOrc runs PlannedReparentShard to elect a new primary
DeadPrimaryVTOrc detects when the primary tablet is deadVTOrc runs EmergencyReparentShard to elect a different primary
PrimaryIsReadOnly, PrimarySemiSyncMustBeSet, PrimarySemiSyncMustNotBeSetVTOrc detects when the primary tablet has configuration issues like being read-only, semi-sync being set or not being setVTOrc fixes the configurations on the primary.
NotConnectedToPrimary, ConnectedToWrongPrimary, ReplicationStopped, ReplicaIsWritable, ReplicaSemiSyncMustBeSet, ReplicaSemiSyncMustNotBeSetVTOrc detects when a replica has configuration issues like not being connected to the primary, connected to the wrong primary, replication stopped, replica being writable, semi-sync being set or not being setVTOrc fixes the configurations on the replica.

Flags #

For a full list of supported flags, please look at VTOrc reference page.

UI, API and Metrics #

For information about the UI, API and metrics that VTOrc exports, please consult this page.

Example invocation of VTOrc #

You can bring VTOrc using the following invocation:

vtorc --topo_implementation etcd2 \
  --topo_global_server_address "localhost:2379" \
  --topo_global_root /vitess/global \
  --port 15000 \
  --log_dir=${VTDATAROOT}/tmp \
  --recovery-period-block-duration "10m" \
  --instance-poll-time "1s" \
  --topo-information-refresh-duration "30s" \
  --alsologtostderr

You can optionally add a clusters_to_watch flag that contains a comma separated list of keyspaces or keyspace/shard values. If specified, VTOrc will manage only those clusters.

Durability Policies #

All the failovers that VTOrc performs will be honoring the durability policies. Please be careful in setting the desired durability policies for your keyspace because this will affect what situations VTOrc can recover from and what situations will require manual intervention.

Running VTOrc using the Vitess Operator #

To find information about deploying VTOrc using Vitess Operator please take a look at this page.