Collations and Character sets

MySQL is a Unicode-aware database, and as explained on the MySQL documentation, it supports using many different character sets to store textual data in databases, and many different collations which are used to compare and sort this textual data.

Likewise, Vitess is also Unicode-aware, and it supports the vast majority of the collations and charsets in the underlying MySQL server. On a basic level, this support means that Vitess handles gracefully textual Unicode columns and queries, and relays this information to MySQL clients without losing or corrupting the encoding of the data. On top of this, newer versions of Vitess are also capable of performing textual comparison and sorting operations in SQL queries directly on VTGate instances, greatly speeding up complex operations such as cross-shard joins.

Supported collations #

The collation environment (i.e. the set of support collations and charsets) of a Vitess cluster is defined by the MySQL server version flag (-mysql_server_version) provided to the VTGate and VTTablet instances in the cluster. Higher (newer) MySQL versions will enable built-in support for more collations.

The following table lists all of the supported collations in the current release of Vitess:

Legend
Vitess has full support for this collation.
⚠️The underlying MySQL engine supports this collation, but Vitess does not.
Neither Vitess nor the underlying MySQL engine supports this collation.

Using collations that are not supported by Vitess but implemented in the underlying MySQL instance can lead to unpredictable behavior.

CollationCharsetMySQL 8.0MySQL 5.7MySQL 5.6MariaDB 10.3MariaDB 10.2MariaDB 10.1MariaDB 10.0
big5_chinese_cibig5⚠️⚠️⚠️⚠️⚠️⚠️⚠️
latin2_czech_cslatin2⚠️⚠️⚠️⚠️⚠️⚠️⚠️
dec8_swedish_cidec8
cp850_general_cicp850
latin1_german1_cilatin1
hp8_english_cihp8
koi8r_general_cikoi8r
latin1_swedish_cilatin1
latin2_general_cilatin2
swe7_swedish_ciswe7
ascii_general_ciascii
ujis_japanese_ciujis
sjis_japanese_cisjis
cp1251_bulgarian_cicp1251
latin1_danish_cilatin1
hebrew_general_cihebrew
tis620_thai_citis620⚠️⚠️⚠️⚠️⚠️⚠️⚠️
euckr_korean_cieuckr
latin7_estonian_cslatin7
latin2_hungarian_cilatin2
koi8u_general_cikoi8u
cp1251_ukrainian_cicp1251
gb2312_chinese_cigb2312
greek_general_cigreek
cp1250_general_cicp1250
latin2_croatian_cilatin2
gbk_chinese_cigbk⚠️⚠️⚠️⚠️⚠️⚠️⚠️
cp1257_lithuanian_cicp1257
latin5_turkish_cilatin5
latin1_german2_cilatin1⚠️⚠️⚠️⚠️⚠️⚠️⚠️
armscii8_general_ciarmscii8
utf8_general_ciutf8
cp1250_czech_cscp1250⚠️⚠️⚠️⚠️⚠️⚠️⚠️
ucs2_general_ciucs2
cp866_general_cicp866
keybcs2_general_cikeybcs2
macce_general_cimacce
macroman_general_cimacroman
cp852_general_cicp852
latin7_general_cilatin7
latin7_general_cslatin7
macce_binmacce
cp1250_croatian_cicp1250
utf8mb4_general_ciutf8mb4
utf8mb4_binutf8mb4
latin1_binlatin1
latin1_general_cilatin1
latin1_general_cslatin1
cp1251_bincp1251
cp1251_general_cicp1251
cp1251_general_cscp1251
macroman_binmacroman
utf16_general_ciutf16
utf16_binutf16
utf16le_general_ciutf16le
cp1256_general_cicp1256
cp1257_bincp1257
cp1257_general_cicp1257
utf32_general_ciutf32
utf32_binutf32
utf16le_binutf16le
binarybinary
armscii8_binarmscii8
ascii_binascii
cp1250_bincp1250
cp1256_bincp1256
cp866_bincp866
dec8_bindec8
greek_bingreek
hebrew_binhebrew
hp8_binhp8
keybcs2_binkeybcs2
koi8r_binkoi8r
koi8u_binkoi8u
utf8_tolower_ciutf8⚠️
latin2_binlatin2
latin5_binlatin5
latin7_binlatin7
cp850_bincp850
cp852_bincp852
swe7_binswe7
utf8_binutf8
big5_binbig5⚠️⚠️⚠️⚠️⚠️⚠️⚠️
euckr_bineuckr
gb2312_bingb2312
gbk_bingbk⚠️⚠️⚠️⚠️⚠️⚠️⚠️
sjis_binsjis
tis620_bintis620⚠️⚠️⚠️⚠️⚠️⚠️⚠️
ucs2_binucs2
ujis_binujis
geostd8_general_cigeostd8
geostd8_bingeostd8
latin1_spanish_cilatin1
cp932_japanese_cicp932
cp932_bincp932
eucjpms_japanese_cieucjpms
eucjpms_bineucjpms
cp1250_polish_cicp1250
utf16_unicode_ciutf16
utf16_icelandic_ciutf16
utf16_latvian_ciutf16
utf16_romanian_ciutf16
utf16_slovenian_ciutf16
utf16_polish_ciutf16
utf16_estonian_ciutf16
utf16_spanish_ciutf16
utf16_swedish_ciutf16
utf16_turkish_ciutf16
utf16_czech_ciutf16
utf16_danish_ciutf16
utf16_lithuanian_ciutf16
utf16_slovak_ciutf16
utf16_spanish2_ciutf16
utf16_roman_ciutf16
utf16_persian_ciutf16
utf16_esperanto_ciutf16
utf16_hungarian_ciutf16
utf16_sinhala_ciutf16
utf16_german2_ciutf16
utf16_croatian_ciutf16
utf16_unicode_520_ciutf16
utf16_vietnamese_ciutf16
ucs2_unicode_ciucs2
ucs2_icelandic_ciucs2
ucs2_latvian_ciucs2
ucs2_romanian_ciucs2
ucs2_slovenian_ciucs2
ucs2_polish_ciucs2
ucs2_estonian_ciucs2
ucs2_spanish_ciucs2
ucs2_swedish_ciucs2
ucs2_turkish_ciucs2
ucs2_czech_ciucs2
ucs2_danish_ciucs2
ucs2_lithuanian_ciucs2
ucs2_slovak_ciucs2
ucs2_spanish2_ciucs2
ucs2_roman_ciucs2
ucs2_persian_ciucs2
ucs2_esperanto_ciucs2
ucs2_hungarian_ciucs2
ucs2_sinhala_ciucs2
ucs2_german2_ciucs2
ucs2_croatian_ciucs2
ucs2_unicode_520_ciucs2
ucs2_vietnamese_ciucs2
ucs2_general_mysql500_ciucs2⚠️⚠️⚠️⚠️⚠️⚠️⚠️
utf32_unicode_ciutf32
utf32_icelandic_ciutf32
utf32_latvian_ciutf32
utf32_romanian_ciutf32
utf32_slovenian_ciutf32
utf32_polish_ciutf32
utf32_estonian_ciutf32
utf32_spanish_ciutf32
utf32_swedish_ciutf32
utf32_turkish_ciutf32
utf32_czech_ciutf32
utf32_danish_ciutf32
utf32_lithuanian_ciutf32
utf32_slovak_ciutf32
utf32_spanish2_ciutf32
utf32_roman_ciutf32
utf32_persian_ciutf32
utf32_esperanto_ciutf32
utf32_hungarian_ciutf32
utf32_sinhala_ciutf32
utf32_german2_ciutf32
utf32_croatian_ciutf32
utf32_unicode_520_ciutf32
utf32_vietnamese_ciutf32
utf8_unicode_ciutf8
utf8_icelandic_ciutf8
utf8_latvian_ciutf8
utf8_romanian_ciutf8
utf8_slovenian_ciutf8
utf8_polish_ciutf8
utf8_estonian_ciutf8
utf8_spanish_ciutf8
utf8_swedish_ciutf8
utf8_turkish_ciutf8
utf8_czech_ciutf8
utf8_danish_ciutf8
utf8_lithuanian_ciutf8
utf8_slovak_ciutf8
utf8_spanish2_ciutf8
utf8_roman_ciutf8
utf8_persian_ciutf8
utf8_esperanto_ciutf8
utf8_hungarian_ciutf8
utf8_sinhala_ciutf8
utf8_german2_ciutf8
utf8_croatian_ciutf8
utf8_unicode_520_ciutf8
utf8_vietnamese_ciutf8
utf8_general_mysql500_ciutf8⚠️⚠️⚠️⚠️⚠️⚠️⚠️
utf8mb4_unicode_ciutf8mb4
utf8mb4_icelandic_ciutf8mb4
utf8mb4_latvian_ciutf8mb4
utf8mb4_romanian_ciutf8mb4
utf8mb4_slovenian_ciutf8mb4
utf8mb4_polish_ciutf8mb4
utf8mb4_estonian_ciutf8mb4
utf8mb4_spanish_ciutf8mb4
utf8mb4_swedish_ciutf8mb4
utf8mb4_turkish_ciutf8mb4
utf8mb4_czech_ciutf8mb4
utf8mb4_danish_ciutf8mb4
utf8mb4_lithuanian_ciutf8mb4
utf8mb4_slovak_ciutf8mb4
utf8mb4_spanish2_ciutf8mb4
utf8mb4_roman_ciutf8mb4
utf8mb4_persian_ciutf8mb4
utf8mb4_esperanto_ciutf8mb4
utf8mb4_hungarian_ciutf8mb4
utf8mb4_sinhala_ciutf8mb4
utf8mb4_german2_ciutf8mb4
utf8mb4_croatian_ciutf8mb4
utf8mb4_unicode_520_ciutf8mb4
utf8mb4_vietnamese_ciutf8mb4
gb18030_chinese_cigb18030⚠️⚠️
gb18030_bingb18030⚠️⚠️
gb18030_unicode_520_cigb18030
utf8mb4_0900_ai_ciutf8mb4
utf8mb4_de_pb_0900_ai_ciutf8mb4
utf8mb4_is_0900_ai_ciutf8mb4
utf8mb4_lv_0900_ai_ciutf8mb4
utf8mb4_ro_0900_ai_ciutf8mb4
utf8mb4_sl_0900_ai_ciutf8mb4
utf8mb4_pl_0900_ai_ciutf8mb4
utf8mb4_et_0900_ai_ciutf8mb4
utf8mb4_es_0900_ai_ciutf8mb4
utf8mb4_sv_0900_ai_ciutf8mb4
utf8mb4_tr_0900_ai_ciutf8mb4
utf8mb4_cs_0900_ai_ciutf8mb4
utf8mb4_da_0900_ai_ciutf8mb4
utf8mb4_lt_0900_ai_ciutf8mb4
utf8mb4_sk_0900_ai_ciutf8mb4
utf8mb4_es_trad_0900_ai_ciutf8mb4
utf8mb4_la_0900_ai_ciutf8mb4
utf8mb4_eo_0900_ai_ciutf8mb4
utf8mb4_hu_0900_ai_ciutf8mb4
utf8mb4_hr_0900_ai_ciutf8mb4
utf8mb4_vi_0900_ai_ciutf8mb4
utf8mb4_0900_as_csutf8mb4
utf8mb4_de_pb_0900_as_csutf8mb4
utf8mb4_is_0900_as_csutf8mb4
utf8mb4_lv_0900_as_csutf8mb4
utf8mb4_ro_0900_as_csutf8mb4
utf8mb4_sl_0900_as_csutf8mb4
utf8mb4_pl_0900_as_csutf8mb4
utf8mb4_et_0900_as_csutf8mb4
utf8mb4_es_0900_as_csutf8mb4
utf8mb4_sv_0900_as_csutf8mb4
utf8mb4_tr_0900_as_csutf8mb4
utf8mb4_cs_0900_as_csutf8mb4
utf8mb4_da_0900_as_csutf8mb4
utf8mb4_lt_0900_as_csutf8mb4
utf8mb4_sk_0900_as_csutf8mb4
utf8mb4_es_trad_0900_as_csutf8mb4
utf8mb4_la_0900_as_csutf8mb4
utf8mb4_eo_0900_as_csutf8mb4
utf8mb4_hu_0900_as_csutf8mb4
utf8mb4_hr_0900_as_csutf8mb4
utf8mb4_vi_0900_as_csutf8mb4
utf8mb4_ja_0900_as_csutf8mb4
utf8mb4_ja_0900_as_cs_ksutf8mb4
utf8mb4_0900_as_ciutf8mb4
utf8mb4_ru_0900_ai_ciutf8mb4
utf8mb4_ru_0900_as_csutf8mb4
utf8mb4_zh_0900_as_csutf8mb4
utf8mb4_0900_binutf8mb4
utf8_croatian_ciutf8⚠️⚠️⚠️⚠️
utf8_myanmar_ciutf8⚠️⚠️⚠️⚠️
utf8_thai_520_w2utf8⚠️⚠️⚠️
utf8mb4_croatian_ciutf8mb4⚠️⚠️⚠️⚠️
utf8mb4_myanmar_ciutf8mb4⚠️⚠️⚠️⚠️
utf8mb4_thai_520_w2utf8mb4⚠️⚠️⚠️
ucs2_croatian_ciucs2⚠️⚠️⚠️⚠️
ucs2_myanmar_ciucs2⚠️⚠️⚠️⚠️
ucs2_thai_520_w2ucs2⚠️⚠️⚠️
utf16_croatian_ciutf16⚠️⚠️⚠️⚠️
utf16_myanmar_ciutf16⚠️⚠️⚠️⚠️
utf16_thai_520_w2utf16⚠️⚠️⚠️
utf32_croatian_ciutf32⚠️⚠️⚠️⚠️
utf32_myanmar_ciutf32⚠️⚠️⚠️⚠️
utf32_thai_520_w2utf32⚠️⚠️⚠️
big5_chinese_nopad_cibig5⚠️⚠️
dec8_swedish_nopad_cidec8⚠️⚠️
cp850_general_nopad_cicp850⚠️⚠️
hp8_english_nopad_cihp8⚠️⚠️
koi8r_general_nopad_cikoi8r⚠️⚠️
latin1_swedish_nopad_cilatin1⚠️⚠️
latin2_general_nopad_cilatin2⚠️⚠️
swe7_swedish_nopad_ciswe7⚠️⚠️
ascii_general_nopad_ciascii⚠️⚠️
ujis_japanese_nopad_ciujis⚠️⚠️
sjis_japanese_nopad_cisjis⚠️⚠️
hebrew_general_nopad_cihebrew⚠️⚠️
tis620_thai_nopad_citis620⚠️⚠️
euckr_korean_nopad_cieuckr⚠️⚠️
koi8u_general_nopad_cikoi8u⚠️⚠️
gb2312_chinese_nopad_cigb2312⚠️⚠️
greek_general_nopad_cigreek⚠️⚠️
cp1250_general_nopad_cicp1250⚠️⚠️
gbk_chinese_nopad_cigbk⚠️⚠️
latin5_turkish_nopad_cilatin5⚠️⚠️
armscii8_general_nopad_ciarmscii8⚠️⚠️
utf8_general_nopad_ciutf8⚠️⚠️
ucs2_general_nopad_ciucs2⚠️⚠️
cp866_general_nopad_cicp866⚠️⚠️
keybcs2_general_nopad_cikeybcs2⚠️⚠️
macce_general_nopad_cimacce⚠️⚠️
macroman_general_nopad_cimacroman⚠️⚠️
cp852_general_nopad_cicp852⚠️⚠️
latin7_general_nopad_cilatin7⚠️⚠️
macce_nopad_binmacce⚠️⚠️
utf8mb4_general_nopad_ciutf8mb4⚠️⚠️
utf8mb4_nopad_binutf8mb4⚠️⚠️
latin1_nopad_binlatin1⚠️⚠️
cp1251_nopad_bincp1251⚠️⚠️
cp1251_general_nopad_cicp1251⚠️⚠️
macroman_nopad_binmacroman⚠️⚠️
utf16_general_nopad_ciutf16⚠️⚠️
utf16_nopad_binutf16⚠️⚠️
utf16le_general_nopad_ciutf16le⚠️⚠️
cp1256_general_nopad_cicp1256⚠️⚠️
cp1257_nopad_bincp1257⚠️⚠️
cp1257_general_nopad_cicp1257⚠️⚠️
utf32_general_nopad_ciutf32⚠️⚠️
utf32_nopad_binutf32⚠️⚠️
utf16le_nopad_binutf16le⚠️⚠️
armscii8_nopad_binarmscii8⚠️⚠️
ascii_nopad_binascii⚠️⚠️
cp1250_nopad_bincp1250⚠️⚠️
cp1256_nopad_bincp1256⚠️⚠️
cp866_nopad_bincp866⚠️⚠️
dec8_nopad_bindec8⚠️⚠️
greek_nopad_bingreek⚠️⚠️
hebrew_nopad_binhebrew⚠️⚠️
hp8_nopad_binhp8⚠️⚠️
keybcs2_nopad_binkeybcs2⚠️⚠️
koi8r_nopad_binkoi8r⚠️⚠️
koi8u_nopad_binkoi8u⚠️⚠️
latin2_nopad_binlatin2⚠️⚠️
latin5_nopad_binlatin5⚠️⚠️
latin7_nopad_binlatin7⚠️⚠️
cp850_nopad_bincp850⚠️⚠️
cp852_nopad_bincp852⚠️⚠️
swe7_nopad_binswe7⚠️⚠️
utf8_nopad_binutf8⚠️⚠️
big5_nopad_binbig5⚠️⚠️
euckr_nopad_bineuckr⚠️⚠️
gb2312_nopad_bingb2312⚠️⚠️
gbk_nopad_bingbk⚠️⚠️
sjis_nopad_binsjis⚠️⚠️
tis620_nopad_bintis620⚠️⚠️
ucs2_nopad_binucs2⚠️⚠️
ujis_nopad_binujis⚠️⚠️
geostd8_general_nopad_cigeostd8⚠️⚠️
geostd8_nopad_bingeostd8⚠️⚠️
cp932_japanese_nopad_cicp932⚠️⚠️
cp932_nopad_bincp932⚠️⚠️
eucjpms_japanese_nopad_cieucjpms⚠️⚠️
eucjpms_nopad_bineucjpms⚠️⚠️
utf16_unicode_nopad_ciutf16⚠️⚠️
utf16_unicode_520_nopad_ciutf16⚠️⚠️
ucs2_unicode_nopad_ciucs2⚠️⚠️
ucs2_unicode_520_nopad_ciucs2⚠️⚠️
utf32_unicode_nopad_ciutf32⚠️⚠️
utf32_unicode_520_nopad_ciutf32⚠️⚠️
utf8_unicode_nopad_ciutf8⚠️⚠️
utf8_unicode_520_nopad_ciutf8⚠️⚠️
utf8mb4_unicode_nopad_ciutf8mb4⚠️⚠️
utf8mb4_unicode_520_nopad_ciutf8mb4⚠️⚠️

Configuring the default connection charset for a Vitess cluster #

The default connection charset for a Vitess cluster is configured in your VTTablet instances via the -db_charset flag. This flag modifies the behavior of the connections that the tablet creates, not the underlying MySQL instance: it defines the charset that VTTablet uses when opening connections to MySQL. If the -db_charset flag is left empty, VTTablet will default to an utf8mb4 charset based on the underlying MySQL version. For instance, MySQL 5.x will default to utf8mb4_general_ci, while MySQL 8.x defaults to utf8mb4_0900_ai_ci. Because the handshake packet of MySQL leaves only 1 byte to reference the charset ID, only charset IDs <= 255 are supported.

The @character_set_client of a VTTablet is automatically propagated to all the VTGates that connect to it, and hence to all the MySQL clients connected to the VTGates. It is a configuration error to deploy several VTTablets in the same Vitess cluster with different connection charset: it will cause warning messages in the VTGates and lead to inconsistent behaviors.

The @character_set_client of a Vitess cluster is constant: it cannot be changed at runtime via SQL (e.g. by issuing a SET character_set_client = utf8 statement). VTGates will reject such queries.

Do note that, as the MySQL documentation explains, the @character_set_client setting only applies when performing character comparisons in SQL queries that don't have explicit collation information. For instance, a query such as SELECT 'foo' = 'FOO' will use the default @character_set_client for the cluster when picking the charset and collation for the two string literals, because these literals contain no explicit charset or collation information. A query such as SELECT c FROM t WHERE c = 'foo' will not use the @character_set_client; it will use the charset and collation that was defined for column c when creating table t.

Vitess has extensive support for creating collation-aware tables and columns in your SQL database, using any charset, and performing collation-aware operations in SQL queries. Any of the supported collations and charsets in Vitess can be used when declaring a table or a row in your database -- it does not need to match the specific value of @character_set_client. Generally speaking, setting @character_set_client to a charset not based on utf8mb4 is a mistake. We strongly encourage you to leave the -db_charset flag to its default value.