UI, API and Metrics

UI #

In order to use UI, --port flag has to be provided.

Currently, the /debug/status lists the recent recoveries that VTOrc has performed.

VTOrc-recent-recoveries

APIs #

VTOrc supports the following APIs which can be used for monitoring and changing the behaviour of VTOrc.

New APIAdditional notes
/api/problemsThis API lists all the instances that have any problems in them. The problems range from replication not running to errant GTIDs. The new API also supports filtering using the keyspace and shard name
/api/disable-global-recoveriesThis API disables the global recoveries in VTOrc. This makes it so that VTOrc doesn't repair any failures it detects.
/api/enable-global-recoveriesThis API enables the global recoveries in VTOrc.
/debug/healthThis API outputs the health of the VTOrc process.
/debug/livenessThis API outputs the liveness of the VTOrc process.
/api/replication-analysisThis API shows the replication analysis of VTOrc. Output is in JSON format.
/api/errant-gtidsThis API shows the tablets that have errant GTIDs as detected by VTOrc. Output is in JSON format. This API supports filtering by keyspace and shard name.

Metrics #

Metrics are available to be seen on the /debug/vars page. VTOrc exports the following metrics:

MetricUsage
PendingRecoveriesThe number of recoveries in progress which haven't completed.
RecoveriesCountThe number of recoveries run. This is further subdivided for all the different recoveries.
SuccessfulRecoveriesThe number of succesful recoveries run. This is further subdivided for all the different recoveries.
FailedRecoveriesThe number of recoveries that failed. This is further subdivided for all the different recoveries.
ErrantGtidTabletCountThe number of tablets with errant GTIDs as detected by VTOrc.
DetectedProblemsBinary gauge that shows the active problems that VTOrc has detected. This is further subdivided by TabletAlias, Keyspace, and Shard.
planned_reparent_countsNumber of times Planned Reparent Shard has been run. It is further subdivided by the keyspace, shard and the result of the operation.
emergency_reparent_countsNumber of times Emergency Reparent Shard has been run. It is further subdivided by the keyspace, shard and the result of the operation.
reparent_shard_operation_timingsTimings of reparent shard operations indexed by the type of operation.
If there is some information about VTOrc that you would like to see on the /debug/status page or support for some API or metrics to be added, please let us know in slack in the #feat-vtorc channel

UI, API and Metrics