Materialize
Materialize the results of a query into a table
Description #
Materialize
is a lower level vreplication command that allows for generalized materialization of tables. The target tables
can be copies, aggregations, or views. The target tables are kept in sync in near-realtime.
You can specify multiple tables to materialize using the create
sub-command's --table-settings
flag.
There are limitations, however, to the queries which can be used with Materialize
:
- The query must be a
SELECT
statement - Only the following operators are supported:
=
,<
,<=
,>
,>=
,<>
,!=
(e.g. noIN
,OR
, orLIKE
) - The query must be against a single table (so no
JOIN
s) - The query cannot use
DISTINCT
- The query cannot use a derived table
- Expressions in the query must have an alias, e.g.
select hour(c1) as c1_hour from t1
- The
GROUP BY
expression cannot reference an aggregate expression such asMAX
orCOUNT
INSTANT ADD COLUMN
feature in MySQL 8.0+ with materialization source tables as this can cause the vreplication based materialization workflow to break.The Basic Materialize Workflow Lifecycle #
- Initiate the migration using
Materialize
- Monitor the workflow using
show
orstatus
Materialize --target-keyspace <target-keyspace> show --workflow <workflow>
Materialize --target-keyspace <target-keyspace> status --workflow <workflow>
- Start accessing your views once the workflow has started Replicating
Command #
Please see the Materialize
command reference for a full list of sub-commands and their flags.
Example #
vtctldclient --server localhost:15999 Materialize --workflow product_sales --target-keyspace commerce create --source-keyspace commerce --table-settings '[{"target_table": "sales_by_sku", "create_ddl": "create table sales_by_sku (sku varchar(128) not null primary key, orders bigint, revenue bigint)", "source_expression": "select sku, count(*) as orders, sum(price) as revenue from corder group by sku"}]' --cells zone1 --cells zone2 --tablet-types replica
Parameters #
Action #
Materialize
is an "umbrella" command. The action
or sub-command defines the operation on the workflow.
Options #
Each action
or sub-command has additional options/parameters that can be used to modify its behavior. Please see the command's reference docs for the full list of command options or flags. Below we will add additional information for a subset of key options.
--cells #
optional
default local cell
A comma-separated list of cell names or cell aliases. This list is used by VReplication to determine which
cells should be used to pick a tablet for selecting data from the source keyspace.
Uses #
- Improve performance by using picking a tablet in cells in network proximity with the target
- To reduce bandwidth costs by skipping cells that are in different availability zones
- Select cells where replica lags are lower
--tablet-types #
optional
default "in_order:REPLICA,PRIMARY"
string
Source tablet types to replicate from (e.g. PRIMARY, REPLICA, RDONLY). The value specified impacts tablet selection for the workflow.
Uses #
- To reduce the load on PRIMARY tablets by using REPLICAs or RDONLYs
- Reducing lag by pointing to PRIMARY
--table-settings #
required
JSON
This is a JSON array where each value must contain two key/value pairs. The first required key is 'target_table' and it is the name of the table in the target-keyspace to store the results in. The second required key is 'source_expression' and its value is the select query to run against the source table. An optional key/value pair can also be specified for 'create_ddl' which provides the DDL to create the target table if it does not exist – you can alternatively specify a value of 'copy' if the target table schema should be copied as-is from the source keyspace. Here's an example value for table-settings:
[
{
"target_table": "customer_one_email",
"source_expression": "select email from customer where customer_id = 1"
},
{
"target_table": "states",
"source_expression": "select * from states",
"create_ddl": "copy"
},
{
"target_table": "sales_by_sku",
"source_expression": "select sku, count(*) as orders, sum(price) as revenue from corder group by sku",
"create_ddl": "create table sales_by_sku (sku varchar(128) not null primary key, orders bigint, revenue bigint)"
}
]
Notes #
There are special commands to perform common materialization tasks and you should prefer them
to using Materialize
directly.
- If you just want to copy tables to a different keyspace use MoveTables
- If you want to change sharding strategies use Reshard instead