This guide follows on from the Get Started guides. Please make sure that you have an Operator, local or Helm installation ready. Make sure you have only run the “101” step of the examples, for example 101_initial_cluster.sh in the local example. The commands in this guide also assumes you have setup the shell aliases from the example, e.g. env.sh in the local example.
Materialize is a new VReplication workflow in Vitess 6. It can be used as a more general way to achieve something similar to MoveTables, or as a way to generate materialized views of a table (or set of tables) in the same or different keyspace from the source table (or set of tables). In general, it can be used to create and maintain continually updated materialized views in Vitess, without having to resort to manual or trigger-based population of the view content.
Since Materialize uses VReplication, the view can be kept up-to-date very close to real-time, which enables use-cases like creating copies of the same table sharded different ways for the purposes of certain types of queries that would otherwise be prohibitively expensive on the original table. Materialize is also flexible enough to allow for you to pre-create the schema and vschema for the copied table, allowing you to, for example, maintain a copy of a table without some of the source table’s MySQL indexes. Alternatively, you could use Materialize to do certain schema changes (e.g. change the type of a table column) without having to use other tools like gh-ost.
In our example, we will be using Materialize to perform something similar to the MoveTables user guide, which will cover just the basics of what is possible using Materialize.
Let’s start by simulating this situation by loading sample data:
# On helm andlocal installs:
mysql < /usr/local/vitess/examples/common/insert_commerce_data.sql
mysql --table < insert_commerce_data.sql
In this scenario, we are going to make two copies of the corder table in the same keyspace using a different tablenames of corder_view and corder_view_redacted. The first copy will be identical to the source table, but for the corder_view_redacted copy, we will use the opportunity to drop the price column from the copy. Since we are doing the Materialize to the same keyspace, we do not need to create a new keyspace or tablets as we did for the MoveTables user guide.
In the case where we using Materialize to copy tables between keyspaces, we can use the "create_ddl": "copy" option in the Materializejson_spectable_settings to create the target table for us (similar to what MoveTables does). However, in our case where we are using Materialize with a target table name different from the source table name, we need to manually create the target tables. Let’s go ahead and do that:
We will run two Materialize operations, one for each copy/view of the corder table we will be creating. We could combine these two operations into a single Materialize operation, but we will keep them separate for clarity.
Note that the target table is just a normal table, there is nothing that prevents you from writing to it directly. While you might not want to do that in this “view” use-case, in certain other use-cases, it might be completely acceptable to write to the table, as long as you don’t end up altering or removing rows in a fashion that would break the “replication” part of VReplication (e.g. removing a row in the target table directly that is later updated in the source table).
While we can also see and manipulate the underlying VReplication streams
created by Materialize; there are commands to show, stop, start
and delete the operations associated with a Materialize workflow.
For example, once we have started the Materialize command above,
we can observe the status of the VReplication stream doing the
materialization via the vtctlclient Workflow command:
Note that deleting the workflow will not drop the target table for the
Materialize workflow, or any of the data already copied. The data
in the target table will remain as it was at the moment the workflow
was deleted (or previously stopped).
As with MoveTables, a VReplication stream was formed for each of the Materialize workflows we executed. We can see these by inspecting the VReplication table on the target keyspace master tablet, e.g. in this case:
It is important to use the vtctlclient VReplicationExec command to inspect this table, since some of the fields are binary and might not render properly in a MySQL client (at least with default options). In the above output, you can see a summary of the VReplication streams that were setup (and are still Running) to copy and then do continuous replication of the source table (corder) to the two different target tables.
As seen earlier, you can easily use the vtctlclient Workflow ... delete
command to clean up a materialize operation. If you like, you can also
instead use the VReplicationExec command to temporarily stop the replication
streams for the VReplication streams that make up the Materialize process.
For example, to stop both streams, you can do:
Any changes to the source tables will now not be applied to the target tables until you update the state column back to Running.
Lastly, you can clean up the Materialize process by just using VReplicationExec to delete the rows in the _vt.vreplication table. This will do the necessary runtime cleanup as well. E.g.:
$ vtctlclient VReplicationExec zone1-0000000100 'delete from _vt.vreplication where id in (1,2)'
$ vtctlclient VReplicationExec zone1-0000000100 'select * from _vt.vreplication'
| id | workflow | source | pos | stop_pos | max_tps | max_replication_lag | cell | tablet_types | time_updated | transaction_timestamp | state | message | db_name |
Note that this just cleans up the VReplication streams; the actual source and target tables are left untouched and in the same state they were at the moment the VReplication streams were stopped or deleted.
As mentioned at the beginning, Materialize gives you finer control over the VReplication process without having to form VReplication rules completely by hand. For the ultimate flexibility, that is still possible, but you should be able to use Materialize together with other Vitess features like routing rules to cover a large set of potential migration and data maintenance use-cases without resorting to creating VReplication rules directly.