VDiff does a row by row comparison of all tables associated with the workflow, diffing the
source keyspace and the target keyspace and reporting counts of missing/extra/unmatched rows.
It is highly recommended that you do this before you finalize a workflow with SwitchTraffic and complete.
VDiff supports diffing tables without a defined Primary Key (PK)
and it will use a Primary Key equivalent (PKE: index on non-NULL unique column(s)) if one exists.
However, be aware of the additional overhead and time required to do the comparison in these
cases, particularly if there is no PK or PKE as diffing the table will then require a full table
scan to read every row and a filesort to sort all of them before the diff can start (and this will
have to be done every time it's restarted/resumed). If the table is of any significant size then
it's strongly recommended that you define a PK for the table.
The create action schedules a VDiff to run on the primary tablet of each target shard to verify the subset of data that will live on the given shard. If you do not pass a specific UUID then one will be generated.
Each scheduled VDiff has an associated UUID which is returned by the create action. You can use it
to monitor progress. Example:
$ vtctldclient --server=localhost:15999 VDiff --target-keyspace customer --workflow commerce2customer create
VDiff a35b0006-e6d9-416e-bea9-917795dc5bf3 scheduled on target shards, use show to view progress
The resume action allows you to resume a previously completed VDiff, picking up where it left off and comparing the records where the Primary Key column(s) are greater than the last record processed — with the progress and other status information saved when the run ends. This allows you to do approximate rolling or differential VDiffs (e.g. done after MoveTables finishes the initial copy phase and then again just before SwitchTraffic).
$ vtctldclient --server=localhost:15999 VDiff --target-keyspace customer --workflow commerce2customer resume 4c664dc2-eba9-11ec-9ef7-920702940ee0
VDiff 4c664dc2-eba9-11ec-9ef7-920702940ee0 resumed on target shards, use show to view progress
We cannot guarantee accurate results for resume when different collations are used for a table between the source and target keyspaces (more details can be seen here).
show all lists all VDiffs created for the specified keyspace and workflow.
It is too expensive to get exact real-time row counts for tables, using e.g. SELECT COUNT(*).
So we instead use the statistics available in the
to approximate the number of rows in each table when initializing a VDiff on the target
primary tablet(s). This data is then used in the progress report and it can be significantly
off (up to 50-60+%) depending on the utilization of the underlying MySQL server resources and
the age of the tables. You can specify the create--update-table-stats flag so that VDiff
will run ANALYZE TABLE to update the
statistics for the tables involved on the target primary tablet(s) before executing the
VDiff in order to improve the accuracy of the progress report.
The stop action allows you to stop a running VDiff for any reason — for example, the load on the system(s) may be too high at the moment and you want to postpone the work until off hours. You can then later use the resume action to start the VDiff again from where it left off. Example: