Flags

vttablet flags related to VReplication functionality

There are several flags that can be specified when vttablet is launched that are related to the VReplication functionality. Some of the flags are relevant when tablets are acting as targets and others when tablets are acting as sources in a VReplication workflow.

relay_log_max_size #

Type integer
Unit bytes
Default 250000
Applicable on target

The target tablet receives events from the source and applies the corresponding DML to the underlying MySQL database. Depending on the load on the target the query execution times can change. Also during the copy phase, we are doing bulk inserts. For both of these reasons VReplication introduces a buffer between receiving the events and applying them: the relay log.

The relay log buffers events on the target as they are received from the source. This is done in a separate thread concurrently with the thread that applies the events.

relay_log_max_size defines the maximum buffer size (in bytes). As events arrive they are stored in the relay log. The apply thread consumes these events as fast as it can. When the relay log fills up we no longer pull events from the source until some events are consumed. If single rows are larger than the specified buffer size, a single row is buffered at a time.

vreplication_copy_phase_duration #

Type duration
Default 1h (1 hour)
Applicable on target

When copying the contents of a table we go through 1+ cycles of copying rows, catching up on changes made (binlog events) while we copied rows, and applying those changes to the rows that have been copied (copy,catchup,fastforward). This flag determines at most how long we copy rows before moving through the other stages in the cycle. These cycles will continue until all of the rows have been copied.

  • You can see metrics related to the copy phase in the following values at the /debug/vars vttablet endpoint: VReplicationPhaseTimings, VReplicationPhaseTimingsCounts, VReplicationPhaseTimingsTotal, VReplicationCopyLoopCount, VReplicationCopyLoopCountTotal, VReplicationCopyRowCount, VReplicationCopyRowCountTotal, VStreamerPhaseTiming, and VStreamerErrors
You should not generally need to change this. But, you may want to increase this duration if the source has little to no write traffic occurring during the copy phase (to speed things along) and you may want to decrease it if the write rate is very high on the source during the copy phase (to ensure we can stay caught up with changes that are happening).

vstream_packet_size #

Type integer
Unit bytes
Default 250000
Applicable on source

On the source, events are buffered where applicable, to minimize network overhead. For example, multiple row events in a transaction or the set of begin/dml/commit event sets are buffered and sent together. Commits, DDLs, and synthetic events generated by VReplication like heartbeats, resharding journals cause the events buffered on the source to be sent immediately.

vstream_packet_size specifies the suggested packet size for VReplication streamer. This is used only as a recommendation. The actual packet size may be more or less than this amount depending on the number and type of events yet to be sent on the source.

vreplication_heartbeat_update_interval #

Type integer
Unit seconds
Default 1
Maximum 60 (one minute)
Applicable on target

For an idle source shard, the source vstreamer sends a heartbeat. Currently, that is once per second. On receiving the heartbeat the target VReplication module updates the time_updated column of the relevant row of \_vt.vreplication. For some setups this is a problem, for example:

  • if there are too many streams the extra write QPS or CPU load due to these updates are unacceptable
  • if there are too many streams and/or a large source field (lot of participating tables) which generates unacceptable increase in the binlog size
  • even for a single stream, if the server is of a lower configuration, then the resulting increase in the QPS or binlog increase may become significant as a percentage of resources

vreplicationHeartbeatUpdateInterval determines how often the time_updated column is updated if there is no activity on the source and the source vstream is only sending heartbeats. Use a low value if you expect a high QPS or you are monitoring this column to alert about potential outages. Keep this high if:

  • you have too many streams and the extra write QPS or CPU load due to these updates is unacceptable OR
  • you have too many streams and/or a large binlogsource field (i.e., there are a lot of participating tables) which generates unacceptable increase in your binlog size

Some internal processes (like online ddl) depend on the heartbeat updates for operating properly. Hence there is an upper limit on this interval, which is 60 seconds.

watch_replication_stream #

Type bool
Default false
Applicable on source

By default vttablets reload their schema every queryserver-config-schema-reload-time seconds (default 30 minutes). This can cause a problem while streaming events if DDLs are applied on the source and streaming is started after the DDL was applied but before vttablet refreshed its schema. This is alleviated by the watcher.

When enabled, vttablet will start the watcher which streams the MySQL replication stream from the local database, and uses it to proactively update its schema when it encounters a DDL.

track_schema_versions #

Type bool
Default false
Applicable on source

All vstreams on a tablet share a common engine. vstreams that are lagging might see a newer (and hence incorrect) version of the schema in case DDLs were applied in between. Also, reloading schemas is an expensive operation. If there are multiple vstreams, each of them will separately receive a DDL event resulting in multiple reloads for the same DDL. The tracker addresses these issues.

When enabled, vttablet will start the tracker which runs a separate vstream that monitors DDLs and stores the version of the schema at the position that a DDL is applied in the schema version table. So if we are streaming events from the past we can get the corresponding schema and interpret the fields from the event correctly.

vreplication_retry_delay #

Type integer
Unit seconds
Default 5
Applicable on target

The target might encounter connection failures during a workflow. VReplication automatically retries stalled streams after vreplication_retry_delay seconds

vreplication_tablet_type #

Type string
Default PRIMARY,REPLICA
Applicable on target

This parameter specifies the default tablet_types that will be used by the tablet picker to find sources for a VReplication stream. It can be overridden, per workflow, by passing a different list to the workflow commands like MoveTables and Reshard.

vreplication_experimental_flags #

Type bitmask
Default 0
Applicable on target

Features that are not field-tested, that are not backward-compatible, or need to be proven in production environments are put behind vreplication_experimental_flags. These features are temporary and will either be made permanent, removed, or put behind a separate vttablet option. Currently, the only experimental features are expected to be performance improvements.

This will be a bit-mask for each such feature. The ones currently defined:

bitmask: 0x1 => If set then we optimize the catchup phase by not sending inserts for rows that are outside the range of primary keys already copied. More details at https://github.com/vitessio/vitess/pull/7708


Flags