vtorc
VTOrc is the automated fault detection and repair tool of Vitess.
Example Usage #
Start VTOrc as follows:
export TOPOLOGY_FLAGS="--topo_implementation etcd2 --topo_global_server_address localhost:2379 --topo_global_root /vitess/global"
export VTDATAROOT="/tmp"
vtorc \
  $TOPOLOGY_FLAGS \
  --log_dir $VTDATAROOT/tmp \
  --port 15000 \
  --recovery-period-block-duration "10m" \
  --instance-poll-time "1s" \
  --topo-information-refresh-duration "30s" \
  --alsologtostderr
Options #
The following command line options apply to VTOrc:
| Name | Type | Definition | 
|---|---|---|
| --alsologtostderr | boolean | log to standard error as well as files | 
| --audit-file-location | string | File location where the audit logs are to be stored | 
| --audit-purge-duration | duration | Duration for which audit logs are held before being purged. Should be in multiples of days (default 168h0m0s) | 
| --audit-to-backend | boolean | Whether to store the audit log in the VTOrc database | 
| --audit-to-syslog | boolean | Whether to store the audit log in the syslog | 
| --catch-sigpipe | boolean | catch and ignore SIGPIPE on stdout and stderr if specified | 
| --clusters_to_watch | strings | Comma-separated list of keyspaces or keyspace/shards that this instance will monitor and repair. Defaults to all clusters in the topology. Example: "ks1,ks2/-80" | 
| --config | string | config file name | 
| --consul_auth_static_file | string | JSON File to read the topos/tokens from. | 
| --grpc_auth_static_client_creds | string | When using grpc_static_auth in the server, this file provides the credentials to use to authenticate with server. | 
| --grpc_compression | string | Which protocol to use for compressing gRPC. Default: nothing. Supported: snappy | 
| --grpc_enable_tracing | boolean | Enable gRPC tracing. | 
| --grpc_initial_conn_window_size | int | gRPC initial connection window size | 
| --grpc_initial_window_size | int | gRPC initial window size | 
| --grpc_keepalive_time | duration | After a duration of this time, if the client doesn't see any activity, it pings the server to see if the transport is still alive. (default 10s) | 
| --grpc_keepalive_timeout | duration | After having pinged for keepalive check, the client waits for a duration of Timeout and if no activity is seen even after that the connection is closed. (default 10s) | 
| --grpc_max_message_size | int | Maximum allowed RPC message size. Larger messages will be rejected by gRPC with the error 'exceeding the max size'. (default 16777216) | 
| --grpc_prometheus | boolean | Enable gRPC monitoring with Prometheus. | 
| -h, --help | boolean | display usage and exit | 
| --instance-poll-time | duration | Timer duration on which VTOrc refreshes MySQL information (default 5s) | 
| --keep_logs | duration | keep logs for this long (using ctime) (zero to keep forever) | 
| --keep_logs_by_mtime | duration | keep logs for this long (using mtime) (zero to keep forever) | 
| --lameduck-period | duration | keep running at least this long after SIGTERM before stopping (default 50ms) | 
| --lock-timeout | duration | Maximum time for which a shard/keyspace lock can be acquired for (default 45s) | 
| --log_backtrace_at | traceLocation | when logging hits line file:N, emit a stack trace (default :0) | 
| --log_dir | string | If non-empty, write log files in this directory | 
| --log_err_stacks | boolean | log stack traces for errors | 
| --log_rotate_max_size | uint | size in bytes at which logs are rotated (glog.MaxSize) (default 1887436800) | 
| --logtostderr | boolean | log to standard error instead of files | 
| --onclose_timeout | duration | wait no more than this for OnClose handlers before stopping (default 10s) | 
| --onterm_timeout | duration | wait no more than this for OnTermSync handlers before stopping (default 10s) | 
| --pid_file | string | If set, the process will write its pid to the named file, and delete it on graceful shutdown. | 
| --port | int | port for the server | 
| --pprof | strings | enable profiling | 
| --prevent-cross-cell-failover | boolean | Prevent VTOrc from promoting a primary in a different cell than the current primary in case of a failover | 
| --purge_logs_interval | duration | how often try to remove old logs (default 1h0m0s) | 
| --reasonable-replication-lag | duration | Maximum replication lag on replicas which is deemed to be acceptable (default 10s) | 
| --recovery-period-block-duration | duration | Duration for which a new recovery is blocked on an instance after running a recovery (default 30s) | 
| --recovery-poll-duration | duration | Timer duration on which VTOrc polls its database to run a recovery (default 1s) | 
| --remote_operation_timeout | duration | time to wait for a remote operation (default 15s) | 
| --security_policy | string | the name of a registered security policy to use for controlling access to URLs - empty means allow all for anyone (built-in policies: deny-all, read-only) | 
| --shutdown_wait_time | duration | Maximum time to wait for VTOrc to release all the locks that it is holding before shutting down on SIGTERM (default 30s) | 
| --snapshot-topology-interval | duration | Timer duration on which VTOrc takes a snapshot of the current MySQL information it has in the database. Should be in multiple of hours | 
| --sqlite-data-file | string | SQLite Datafile to use as VTOrc's database (default "file::memory:?mode=memory&cache=shared") | 
| --stderrthreshold | severity | logs at or above this threshold go to stderr (default 1) | 
| --tablet_manager_grpc_ca | string | the server ca to use to validate servers when connecting | 
| --tablet_manager_grpc_cert | string | the cert to use to connect | 
| --tablet_manager_grpc_concurrency | int | concurrency to use to talk to a vttablet server for performance-sensitive RPCs (like ExecuteFetchAs{Dba,AllPrivs,App}) (default 8) | 
| --tablet_manager_grpc_connpool_size | int | number of tablets to keep tmclient connections open to (default 100) | 
| --tablet_manager_grpc_crl | string | the server crl to use to validate server certificates when connecting | 
| --tablet_manager_grpc_key | string | the key to use to connect | 
| --tablet_manager_grpc_server_name | string | the server name to use to validate server certificate | 
| --tablet_manager_protocol | string | Protocol to use to make tabletmanager RPCs to vttablets. (default "grpc") | 
| --topo-information-refresh-duration | duration | Timer duration on which VTOrc refreshes the keyspace and vttablet records from the topology server (default 15s) | 
| --topo_consul_lock_delay | duration | LockDelay for consul session. (default 15s) | 
| --topo_consul_lock_session_checks | string | List of checks for consul session. (default "serfHealth") | 
| --topo_consul_lock_session_ttl | string | TTL for consul session. | 
| --topo_consul_watch_poll_duration | duration | time of the long poll for watch queries. (default 30s) | 
| --topo_etcd_lease_ttl | int | Lease TTL for locks and leader election. The client will use KeepAlive to keep the lease going. (default 30) | 
| --topo_etcd_tls_ca | string | path to the ca to use to validate the server cert when connecting to the etcd topo server | 
| --topo_etcd_tls_cert | string | path to the client cert to use to connect to the etcd topo server, requires topo_etcd_tls_key, enables TLS | 
| --topo_etcd_tls_key | string | path to the client key to use to connect to the etcd topo server, enables TLS | 
| --topo_global_root | string | the path of the global topology data in the global topology server | 
| --topo_global_server_address | string | the address of the global topology server | 
| --topo_implementation | string | the topology implementation to use | 
| --topo_k8s_context | string | The kubeconfig context to use, overrides the 'current-context' from the config | 
| --topo_k8s_kubeconfig | string | Path to a valid kubeconfig file. When running as a k8s pod inside the same cluster you wish to use as the topo, you may omit this and the below arguments, and Vitess is capable of auto-discovering the correct values. https://kubernetes.io/docs/tasks/access-application-cluster/access-cluster/#accessing-the-api-from-a-pod | 
| --topo_k8s_namespace | string | The kubernetes namespace to use for all objects. Default comes from the context or in-cluster config | 
| --topo_zk_auth_file | string | auth to use when connecting to the zk topo server, file contents should be | 
| --topo_zk_base_timeout | duration | zk base timeout (see zk.Connect) (default 30s) | 
| --topo_zk_max_concurrency | int | maximum number of pending requests to send to a Zookeeper server. (default 64) | 
| --topo_zk_tls_ca | string | the server ca to use to validate servers when connecting to the zk topo server | 
| --topo_zk_tls_cert | string | the cert to use to connect to the zk topo server, requires topo_zk_tls_key, enables TLS | 
| --topo_zk_tls_key | string | the key to use to connect to the zk topo server, enables TLS | 
| --v | value | log level for V logs | 
| --version | boolean | print binary version | 
| --vmodule | value | comma-separated list of pattern=N settings for file-filtered logging | 
| --wait-replicas-timeout | duration | Duration for which to wait for replica's to respond when issuing RPCs (default 30s) | 
vtorc