Materialize

Materialize the results of a query into a table

Description #

Materialize is a lower level vreplication command that allows for generalized materialization of tables. The target tables can be copies, aggregations, or views. The target tables are kept in sync in near-realtime.

You can specify multiple tables to materialize using the create sub-command's --table-settings flag. There are limitations, however, to the queries which can be used with Materialize:

  • The query must be a SELECT statement
  • Only the following operators are supported: =, <, <=, >, >=, <>, != (e.g. no IN, OR, or LIKE)
  • The query must be against a single table (so no JOINs)
  • The query cannot use DISTINCT
  • The query cannot use a derived table
  • Expressions in the query must have an alias, e.g. select hour(c1) as c1_hour from t1
  • The GROUP BY expression cannot reference an aggregate expression such as MAX or COUNT
Be careful to avoid using the INSTANT ADD COLUMN feature in MySQL 8.0+ with materialization source tables as this can cause the vreplication based materialization workflow to break.

The Basic Materialize Workflow Lifecycle #

  1. Initiate the migration using Materialize
  2. Monitor the workflow using show or status
    Materialize --target-keyspace <target-keyspace> show --workflow <workflow>
    Materialize --target-keyspace <target-keyspace> status --workflow <workflow>
  3. Start accessing your views once the workflow has started Replicating

Command #

Please see the Materialize command reference for a full list of sub-commands and their flags.

Example #

vtctldclient --server localhost:15999 Materialize --workflow product_sales --target-keyspace commerce create --source-keyspace commerce --table-settings '[{"target_table": "sales_by_sku", "create_ddl": "create table sales_by_sku (sku varchar(128) not null primary key, orders bigint, revenue bigint)", "source_expression": "select sku, count(*) as orders, sum(price) as revenue from corder group by sku"}]' --cells zone1 --cells zone2 --tablet-types replica

Parameters #

Action #

Materialize is an "umbrella" command. The action or sub-command defines the operation on the workflow.

Options #

Each action or sub-command has additional options/parameters that can be used to modify its behavior. Please see the command's reference docs for the full list of command options or flags. Below we will add additional information for a subset of key options.

--cells #

optional
default local cell

A comma-separated list of cell names or cell aliases. This list is used by VReplication to determine which cells should be used to pick a tablet for selecting data from the source keyspace.

Uses #
  • Improve performance by using picking a tablet in cells in network proximity with the target
  • To reduce bandwidth costs by skipping cells that are in different availability zones
  • Select cells where replica lags are lower

--tablet-types #

optional
default "in_order:REPLICA,PRIMARY"
string

Source tablet types to replicate from (e.g. PRIMARY, REPLICA, RDONLY). The value specified impacts tablet selection for the workflow.

Uses #
  • To reduce the load on PRIMARY tablets by using REPLICAs or RDONLYs
  • Reducing lag by pointing to PRIMARY

--table-settings #

required
JSON

This is a JSON array where each value must contain two key/value pairs. The first required key is 'target_table' and it is the name of the table in the target-keyspace to store the results in. The second required key is 'source_expression' and its value is the select query to run against the source table. An optional key/value pair can also be specified for 'create_ddl' which provides the DDL to create the target table if it does not exist – you can alternatively specify a value of 'copy' if the target table schema should be copied as-is from the source keyspace. Here's an example value for table-settings:

[
  {
    "target_table": "customer_one_email",
    "source_expression": "select email from customer where customer_id = 1"
  },
  {
    "target_table": "states",
    "source_expression": "select * from states",
    "create_ddl": "copy"
  },
  {
    "target_table": "sales_by_sku",
    "source_expression": "select sku, count(*) as orders, sum(price) as revenue from corder group by sku",
    "create_ddl": "create table sales_by_sku (sku varchar(128) not null primary key, orders bigint, revenue bigint)"
  }
]

Notes #

There are special commands to perform common materialization tasks and you should prefer them to using Materialize directly.

  • If you just want to copy tables to a different keyspace use MoveTables
  • If you want to change sharding strategies use Reshard instead