VDiff

Compare the source and target in a workflow to ensure integrity

Command #

VDiff  [-source_cell=<cell>] [-target_cell=<cell>] [-tablet_types=replica]
       [-limit=<max rows to diff>] [-table=<table list>] [-format=json]
       [-filtered_replication_wait_time=30s] <keyspace.workflow>

Description #

VDiff does a row by row comparison of all tables associated with the workflow, diffing the source keyspace and the target keyspace and reporting counts of missing/extra/unmatched rows.

It is highly recommended that you do this before you finalize a workflow with SwitchWrites.

Parameters #

-source_cell #

optional
default all

VDiff will choose a tablet from this cell to diff the source table(s) with the target tables

-target_cell #

optional
default all

VDiff will choose a tablet from this cell to diff the source table(s) with the target tables

-tablet_types #

optional
default replica

A comma separated list of tablet types that are used while picking a tablet for sourcing data. One or more from MASTER, REPLICA, RDONLY.

-filtered_replication_wait_time #

optional
default 30s

VDiff finds the current position of the source master and then waits for the target replication to reach that position for _filtered_replication_wait_time_. If the target is much behind the source or if there is a high write qps on the source then this time will need to be increased.

-limit #

optional
default none

Maximum number of rows to run vdiff on (across all tables specified). This limit is usually set while diffing a large table as a quick consistency check.

-tables #

optional
default all tables in the workflow

A comma separated list of tables to run vdiff on.

-format #

optional
default unstructured text output

Only other format supported is json
Example: #
[{
  "ProcessedRows": 5,
  "MatchingRows": 5,
  "MismatchedRows": 0,
  "ExtraRowsSource": 0,
  "ExtraRowsTarget": 0
},{
  "ProcessedRows": 3,
  "MatchingRows": 3,
  "MismatchedRows": 0,
  "ExtraRowsSource": 0,
  "ExtraRowsTarget": 0
}]

keyspace.workflow #

mandatory

Name of target keyspace and the associated workflow to run VDiff on.

Example #

$ vtctlclient VDiff customer.commerce2customer

Summary for corder: {ProcessedRows:10 MatchingRows:10 MismatchedRows:0 ExtraRowsSource:0 ExtraRowsTarget:0}
Summary for customer: {ProcessedRows:11 MatchingRows:11 MismatchedRows:0 ExtraRowsSource:0 ExtraRowsTarget:0}

Notes #

  • You can follow the progress of the command by tailing the vtctld logs
  • VDiff can take very long (hours/days) for huge tables, so this needs to be taken into account. If VDiff takes more than an hour and you use vtctlclient then it will hit the grpc/http default timeout of 1 hour. In that case you can use vtctl (the bundled vctlclient + vtctld) instead.
  • There is no throttling, so you might see an increased lag in the replica used as the source.

VReplication and VDiff performance improvements as well as freno-style throttling support are on the roadmap!