Materialize
101_initial_cluster.sh
in the
local example. The commands in this guide also assume you have setup the shell
aliases from the example, e.g. env.sh
in the local example.Materialize
is a VReplication command/workflow. It can be used as a more
general way to achieve something similar to MoveTables or as a way to generate
materialized views of a table (or set of tables) in the same or
different keyspace from the source table (or set of tables). In general, it can be used to create and maintain
continually updated materialized views in Vitess, without having
to resort to manual or trigger-based population of the view content.
Since Materialize
uses VReplication, the view can be kept up-to-date
in very close to real-time which enables use-cases like creating copies of the same table that are sharded in
different ways for the purposes of avoiding expensive cross-shard queries. Materialize
is also flexible enough to
allow for you to pre-create the schema and VSchema for the copied table, allowing you
to for example maintain a copy of a table without some of the source table's MySQL indexes.
All of the command options and parameters are listed in our reference page for the Materialize
command. In our examples to follow we will only touch on what is possible using
Materialize
.
Let's start by loading some sample data:
mysql < ../common/insert_commerce_data.sql
We can look at what we just inserted:
$ mysql --table < ../common/select_commerce_data.sql
Using commerce
Customer
+-------------+--------------------+
| customer_id | email |
+-------------+--------------------+
| 1 | alice@domain.com |
| 2 | bob@domain.com |
| 3 | charlie@domain.com |
| 4 | dan@domain.com |
| 5 | eve@domain.com |
+-------------+--------------------+
Product
+----------+-------------+-------+
| sku | description | price |
+----------+-------------+-------+
| SKU-1001 | Monitor | 100 |
| SKU-1002 | Keyboard | 30 |
+----------+-------------+-------+
COrder
+----------+-------------+----------+-------+
| order_id | customer_id | sku | price |
+----------+-------------+----------+-------+
| 1 | 1 | SKU-1001 | 100 |
| 2 | 2 | SKU-1002 | 30 |
| 3 | 3 | SKU-1002 | 30 |
| 4 | 4 | SKU-1002 | 30 |
| 5 | 5 | SKU-1002 | 30 |
+----------+-------------+----------+-------+
Note that we are using commerce
keyspace.
Planning to Use Materialize #
In this scenario, we are going to make two copies of the corder
table in the same commerce
keyspace using
the table names of corder_view
and corder_view_redacted
. The first copy will be identical to the source table, but
for the corder_view_redacted
copy we will use the opportunity to drop or redact the price
column.
Create the Destination Tables #
In the case where we are using Materialize
to copy tables between or across keyspaces we can use the
"create_ddl": "copy"
option in the
Materialize --table-settings
flag
to create the target table for us (similar to what MoveTables
does). However, in our case where we are using
Materialize
within a single keyspace (commerce
) so we need to manually create the target tables. Let's go ahead
and do that:
$ cat <<EOF | mysql commerce
CREATE TABLE corder_view (
order_id bigint NOT NULL,
customer_id bigint DEFAULT NULL,
sku varchar(128) DEFAULT NULL,
price bigint DEFAULT NULL,
PRIMARY KEY (order_id)
);
CREATE TABLE corder_view_redacted (
order_id bigint NOT NULL,
customer_id bigint DEFAULT NULL,
sku varchar(128) DEFAULT NULL,
PRIMARY KEY (order_id)
);
EOF
And now we can proceed to the Materialize
step(s).
Start the Simple Copy Materialization #
We will run two Materialize
operations, one for each copy/view of the corder
table we will be creating. We could
combine these two operations into a single Materialize
operation, but we will keep them separate for clarity.
$ vtctldclient Materialize --workflow copy_corder_1 --target-keyspace commerce create --source-keyspace commerce --table-settings '[{"target_table": "corder_view", "source_expression": "select * from corder"}]'
Materialization workflow copy_corder_1 successfully created in the commerce keyspace. Use show to view the status.
Now, we should see the materialized view table corder_view
:
$ mysql --binary-as-hex=false commerce -e "select * from corder_view"
+----------+-------------+----------+-------+
| order_id | customer_id | sku | price |
+----------+-------------+----------+-------+
| 1 | 1 | SKU-1001 | 100 |
| 2 | 2 | SKU-1002 | 30 |
| 3 | 3 | SKU-1002 | 30 |
| 4 | 4 | SKU-1002 | 30 |
| 5 | 5 | SKU-1002 | 30 |
+----------+-------------+----------+-------+
And if we insert a row into the source table, it will be replicated to the materialized view:
$ mysql commerce -e "insert into corder (order_id, customer_id, sku, price) values (6, 6, 'SKU-1002', 30)"
$ mysql --binary-as-hex=false commerce -e "select * from corder_view"
+----------+-------------+----------+-------+
| order_id | customer_id | sku | price |
+----------+-------------+----------+-------+
| 1 | 1 | SKU-1001 | 100 |
| 2 | 2 | SKU-1002 | 30 |
| 3 | 3 | SKU-1002 | 30 |
| 4 | 4 | SKU-1002 | 30 |
| 5 | 5 | SKU-1002 | 30 |
| 6 | 6 | SKU-1002 | 30 |
+----------+-------------+----------+-------+
Note that the target table is just a normal table, there is nothing that prevents you from writing to it directly. While you might not want to do that in this in the "materialized view" use-case, in certain other use-cases it might be completely acceptable to write to the table. Doing so is completedly fine as long as you don't end up altering or removing rows in a fashion that would break the "replication" part of the VReplication workflow (e.g. removing a row in the target table directly that is later updated in the source table).
Viewing the Workflow While in Progress #
Materialize
has actions or sub-commands to show
, stop
, start
and cancel
the
Materialize
workflow. For example, once we have started the Materialize
command above, we can observe the
status of the VReplication workflow using the show
sub-command:
$ vtctldclient Workflow --keyspace commerce list
[
"copy_corder_1"
]
$ vtctldclient Materialize --target-keyspace commerce show --workflow copy_corder_1 --include-logs=false
{
"workflows": [
{
"name": "copy_corder_1",
"source": {
"keyspace": "commerce",
"shards": [
"0"
]
},
"target": {
"keyspace": "commerce",
"shards": [
"0"
]
},
"max_v_replication_lag": "0",
"shard_streams": {
"0/zone1-0000000101": {
"streams": [
{
"id": "1",
"shard": "0",
"tablet": {
"cell": "zone1",
"uid": 101
},
"binlog_source": {
"keyspace": "commerce",
"shard": "0",
"tablet_type": "UNKNOWN",
"key_range": null,
"tables": [],
"filter": {
"rules": [
{
"match": "corder_view",
"filter": "select * from corder",
"convert_enum_to_text": {},
"convert_charset": {},
"source_unique_key_columns": "",
"target_unique_key_columns": "",
"source_unique_key_target_columns": "",
"convert_int_to_enum": {}
}
],
"field_event_mode": "ERR_ON_MISMATCH",
"workflow_type": "0",
"workflow_name": ""
},
"on_ddl": "IGNORE",
"external_mysql": "",
"stop_after_copy": false,
"external_cluster": "",
"source_time_zone": "",
"target_time_zone": ""
},
"position": "fd3ac16e-6880-11ee-be79-13f9a85fe24e:1-241",
"stop_position": "",
"state": "Running",
"db_name": "vt_commerce",
"transaction_timestamp": {
"seconds": "1697061520",
"nanoseconds": 0
},
"time_updated": {
"seconds": "1697061521",
"nanoseconds": 0
},
"message": "",
"copy_states": [],
"logs": [],
"log_fetch_error": "",
"tags": [],
"rows_copied": "5",
"throttler_status": {
"component_throttled": "",
"time_throttled": {
"seconds": "0",
"nanoseconds": 0
}
}
}
],
"tablet_controls": [],
"is_primary_serving": true
}
},
"workflow_type": "Materialize",
"workflow_sub_type": "None",
"max_v_replication_transaction_lag": "0",
"defer_secondary_keys": false
}
]
}
We can now also use the stop
and start
actions to temporarily stop the materialization workflow. For
example:
$ vtctldclient Materialize --target-keyspace commerce stop --workflow copy_corder_1
{
"summary": "Successfully updated the copy_corder_1 workflow on (1) target primary tablets in the commerce keyspace",
"details": [
{
"tablet": {
"cell": "zone1",
"uid": 101
},
"changed": true
}
]
}
And start
to start the workflow again and continue with the materialization:
$ vtctldclient Materialize --target-keyspace commerce start --workflow copy_corder_1
{
"summary": "Successfully updated the copy_corder_1 workflow on (1) target primary tablets in the commerce keyspace",
"details": [
{
"tablet": {
"cell": "zone1",
"uid": 101
},
"changed": true
}
]
}
If at some point, when the initial copy is done and we have fully materialized all of the (initial) data, we do not
want to continue replicating changes from the source, we can cancel
the workflow:
$ vtctldclient Materialize --target-keyspace commerce cancel --workflow copy_corder_1
Successfully cancelled the copy_corder_1 workflow in the commerce keyspace
Note that deleting the workflow will not DROP
the target table of the Materialize
workflow or DELETE
any of the
data already copied. The data in the target table will remain as it was at the moment the workflow was deleted
(or stopped).
Start the Redacted Price Materialization #
Now we can perform the materialization of the corder_view_redacted
table we created earlier. Remember that we created
this table without a price column so we will not be copying that column in our query either:
$ vtctldclient Materialize --target-keyspace commerce create --workflow copy_corder_2 --source-keyspace commerce --table-settings '[{"target_table": "corder_view_redacted", "source_expression": "select order_id, customer_id, sku from corder"}]'
Materialization workflow copy_corder_2 successfully created in the commerce keyspace. Use show to view the status.
Again, looking the target table will show all the source table rows, this time without the price
column:
$ mysql commerce --binary-as-hex=false -e "select * from corder_view_redacted"
+----------+-------------+----------+
| order_id | customer_id | sku |
+----------+-------------+----------+
| 1 | 1 | SKU-1001 |
| 2 | 2 | SKU-1002 |
| 3 | 3 | SKU-1002 |
| 4 | 4 | SKU-1002 |
| 5 | 5 | SKU-1002 |
| 6 | 6 | SKU-1002 |
+----------+-------------+----------+
Again, we can add a row to the source table, and see it replicated into the target table:
$ mysql commerce -e "insert into corder (order_id, customer_id, sku, price) values (7, 7, 'SKU-1002', 30)"
$ mysql commerce --binary-as-hex=false -e "select * from corder_view_redacted"
+----------+-------------+----------+
| order_id | customer_id | sku |
+----------+-------------+----------+
| 1 | 1 | SKU-1001 |
| 2 | 2 | SKU-1002 |
| 3 | 3 | SKU-1002 |
| 4 | 4 | SKU-1002 |
| 5 | 5 | SKU-1002 |
| 6 | 6 | SKU-1002 |
| 7 | 7 | SKU-1002 |
+----------+-------------+----------+
What Happened Under the Covers #
As with MoveTables
, a VReplication stream was formed for each of the Materialize
workflows we created. We can see these by inspecting the internal _vt.vreplication
table on the
target keyspace's primary tablet, e.g. in this case:
# We want to connect directly to the primary mysqld
$ SOCKETPATH=${VTDATAROOT}/$(vtctldclient GetTablets --keyspace=commerce --tablet-type=primary | awk '$1 sub(/zone1-/, "vt_") {print $1}')
$ mysql -u root -h localhost --socket=${SOCKETPATH}/mysql.sock --binary-as-hex=false -e "select * from _vt.vreplication\G"
*************************** 1. row ***************************
id: 2
workflow: copy_corder_2
source: keyspace:"commerce" shard:"0" filter:{rules:{match:"corder_view_redacted" filter:"select order_id, customer_id, sku from corder"}}
pos: MySQL56/4c89eede-8c68-11ed-a40a-6f1a36c22987:1-4764
stop_pos: NULL
max_tps: 9223372036854775807
max_replication_lag: 9223372036854775807
cell:
tablet_types:
time_updated: 1672865504
transaction_timestamp: 1672865502
state: Running
message:
db_name: vt_commerce
rows_copied: 6
tags:
time_heartbeat: 1672865504
workflow_type: 0
time_throttled: 0
component_throttled:
workflow_sub_type: 0
Cleanup #
As seen earlier, you can easily use the Workflow delete
or Materialize cancel
command to clean up a Materialize
workflow when it's no longer needed.
Materialize
VReplication stream, the actual source and target tables are left unchanged
and in the same state they were at the moment the VReplication stream was deleted.