10.4 Connection Character Sets and Collations

A 'connection' is what a client program makes when it connects to the server, to begin a session within which it interacts with the server. The client sends SQL statements, such as queries, over the session connection. The server sends responses, such as result sets or error messages, over the connection back to the client.

Connection Character Set and Collation System Variables

Several character set and collation system variables relate to a client's interaction with the server. Some of these have been mentioned in earlier sections:

Additional character set and collation system variables are involved in handling traffic for the connection between a client and the server. Every client has session-specific connection-related character set and collation system variables. These session system variable values are initialized at connect time, but can be changed within the session.

Several questions about character set and collation handling for client connections can be answered in terms of system variables:

To see the values of the character set and collation system variables that apply to the current session, use this statement:

 SELECT * FROM performance_schema.session_variables
 WHERE VARIABLE_NAME IN (
 'character_set_client', 'character_set_connection',
 'character_set_results', 'collation_connection'
 ) ORDER BY VARIABLE_NAME;

The following simpler statements also display the connection variables, but include other related variables as well. They can be useful to see all character set and collation system variables:

 SHOW SESSION VARIABLES LIKE 'character\_set\_%';
 SHOW SESSION VARIABLES LIKE 'collation\_%';

Clients can fine-tune the settings for these variables, or depend on the defaults (in which case, you can skip the rest of this section). If you do not use the defaults, you must change the character settings for each connection to the server.

Impermissible Client Character Sets

The 'character_set_client' system variable cannot be set to certain character sets:

 ucs2
 utf16
 utf16le
 utf32

Attempting to use any of those character sets as the client character set produces an error:

 mysql> SET character_set_client = 'ucs2';
 ERROR 1231 (42000): Variable 'character_set_client'
 can't be set to the value of 'ucs2'

The same error occurs if any of those character sets are used in the following contexts, all of which result in an attempt to set 'character_set_client' to the named character set:

Client Program Connection Character Set Configuration

When a client connects to the server, it indicates which character set it wants to use for communication with the server. (Actually, the client indicates the default collation for that character set, from which the server can determine the character set.) The server uses this information to set the 'character_set_client', 'character_set_results', 'character_set_connection' system variables to the character set, and 'collation_connection' to the character set default collation. In effect, the server performs the equivalent of a *note 'SET NAMES': set-names. operation.

If the server does not support the requested character set or collation, it falls back to using the server character set and collation to configure the connection. For additional detail about this fallback behavior, see *note charset-connection-error-handling::.

The note 'mysql': mysql, note 'mysqladmin': mysqladmin, note 'mysqlcheck': mysqlcheck, note 'mysqlimport': mysqlimport, and *note 'mysqlshow': mysqlshow. client programs determine the default character set to use as follows:

With the note 'mysql': mysql. client, to use a character set different from the default, you could explicitly execute a note 'SET NAMES': set-names. statement every time you connect to the server (see note charset-connection-client-configuration::). To accomplish the same result more easily, specify the character set in your option file. For example, the following option file setting changes the three connection-related character set system variables set to 'koi8r' each time you invoke note 'mysql': mysql.:

 [mysql]
 default-character-set=koi8r

If you are using the note 'mysql': mysql. client with auto-reconnect enabled (which is not recommended), it is preferable to use the 'charset' command rather than note 'SET NAMES': set-names. For example:

 mysql> charset koi8r
 Charset changed

The 'charset' command issues a note 'SET NAMES': set-names. statement, and also changes the default character set that note 'mysql': mysql. uses when it reconnects after the connection has dropped.

When configuration client programs, you must also consider the environment within which they execute. See *note charset-applications::.

SQL Statements for Connection Character Set Configuration

After a connection has been established, clients can change the character set and collation system variables for the current session. These variables can be changed individually using *note 'SET': set-statement. statements, but two more convenient statements affect the connection-related character set sytem variables as a group:

Note:

Some character sets cannot be used as the client character set. Attempting to use them with note 'SET NAMES': set-names. or note 'SET CHARACTER SET': set-character-set. produces an error. See *note charset-connection-impermissible-client-charset::.

Example: Suppose that 'column1' is defined as 'CHAR(5) CHARACTER SET latin2'. If you do not say note 'SET NAMES': set-names. or note 'SET CHARACTER SET': set-character-set, then for 'SELECT column1 FROM t', the server sends back all the values for 'column1' using the character set that the client specified when it connected. On the other hand, if you say 'SET NAMES 'latin1'' or 'SET CHARACTER SET 'latin1'' before issuing the *note 'SELECT': select. statement, the server converts the 'latin2' values to 'latin1' just before sending results back. Conversion may be lossy for characters that are not in both character sets.

Connection Character Set Error Handling

Attempts to use an inappropriate connection character set or collation can produce an error, or cause the server to fall back to its default character set and collation for a given connection. This section describes problems that can occur when configuring the connection character set. These problems can occur when establishing a connection or when changing the character set within an established connection.

Connect-Time Error Handling

Some character sets cannot be used as the client character set; see *note charset-connection-impermissible-client-charset::. If you specify a character set that is valid but not permitted as a client character set, the server returns an error:

 $> mysql --default-character-set=ucs2
 ERROR 1231 (42000): Variable 'character_set_client' can't be set to
 the value of 'ucs2'

If you specify a character set that the client does not recognize, it produces an error:

 $> mysql --default-character-set=bogus
 mysql: Character set 'bogus' is not a compiled character set and is
 not specified in the '/usr/local/mysql/share/charsets/Index.xml' file
 ERROR 2019 (HY000): Can't initialize character set bogus
 (path: /usr/local/mysql/share/charsets/)

If you specify a character set that the client recognizes but the server does not, the server falls back to its default character set and collation. Suppose that the server is configured to use 'latin1' and 'latin1_swedish_ci' as its defaults, and that it does not recognize 'gb18030' as a valid character set. A client that specifies '--default-character-set=gb18030' is able to connect to the server, but the resulting character set is not what the client wants:

 mysql> SHOW SESSION VARIABLES LIKE 'character\_set\_%';
 +--------------------------+--------+
 | Variable_name            | Value  |
 +--------------------------+--------+
 | character_set_client     | latin1 |
 | character_set_connection | latin1 |
 ...
 | character_set_results    | latin1 |
 ...
 +--------------------------+--------+
 mysql> SHOW SESSION VARIABLES LIKE 'collation_connection';
 +----------------------+-------------------+
 | Variable_name        | Value             |
 +----------------------+-------------------+
 | collation_connection | latin1_swedish_ci |
 +----------------------+-------------------+

You can see that the connection system variables have been set to reflect a character set and collation of 'latin1' and 'latin1_swedish_ci'. This occurs because the server cannot satisfy the client character set request and falls back to its defaults.

In this case, the client cannot use the character set that it wants because the server does not support it. The client must either be willing to use a different character set, or connect to a different server that supports the desired character set.

The same problem occurs in a more subtle context: When the client tells the server to use a character set that the server recognizes, but the default collation for that character set on the client side is not known on the server side. This occurs, for example, when a MySQL 8.0 client wants to connect to a MySQL 5.7 server using 'utf8mb4' as the client character set. A client that specifies '--default-character-set=utf8mb4' is able to connect to the server. However, as in the previous example, the server falls back to its default character set and collation, not what the client requested:

 mysql> SHOW SESSION VARIABLES LIKE 'character\_set\_%';
 +--------------------------+--------+
 | Variable_name            | Value  |
 +--------------------------+--------+
 | character_set_client     | latin1 |
 | character_set_connection | latin1 |
 ...
 | character_set_results    | latin1 |
 ...
 +--------------------------+--------+
 mysql> SHOW SESSION VARIABLES LIKE 'collation_connection';
 +----------------------+-------------------+
 | Variable_name        | Value             |
 +----------------------+-------------------+
 | collation_connection | latin1_swedish_ci |
 +----------------------+-------------------+

Why does this occur? After all, 'utf8mb4' is known to the 8.0 client and the 5.7 server, so both of them recognize it. To understand this behavior, it is necessary to understand that when the client tells the server which character set it wants to use, it really tells the server the default collation for that character set. Therefore, the aforementioned behavior occurs due to a combination of factors:

In this case, the client can still use 'utf8mb4' by issuing a 'SET NAMES 'utf8mb4'' statement after connecting. The resulting collation is the 5.7 default 'utf8mb4' collation; that is, 'utf8mb4_general_ci'. If the client additionally wants a collation of 'utf8mb4_0900_ai_ci', it cannot achieve that because the server does not recognize that collation. The client must either be willing to use a different 'utf8mb4' collation, or connect to a server from MySQL 8.0 or higher.

Runtime Error Handling

Within an established connection, the client can request a change of connection character set and collation with note 'SET NAMES': set-names. or note 'SET CHARACTER SET': set-character-set.

Some character sets cannot be used as the client character set; see *note charset-connection-impermissible-client-charset::. If you specify a character set that is valid but not permitted as a client character set, the server returns an error:

 mysql> SET NAMES 'ucs2';
 ERROR 1231 (42000): Variable 'character_set_client' can't be set to
 the value of 'ucs2'

If the server does not recognize the character set (or the collation), it produces an error:

 mysql> SET NAMES 'bogus';
 ERROR 1115 (42000): Unknown character set: 'bogus'

 mysql> SET NAMES 'utf8mb4' COLLATE 'bogus';
 ERROR 1273 (HY000): Unknown collation: 'bogus'

Tip:

A client that wants to verify whether its requested character set was honored by the server can execute the following statement after connecting and checking that the result is the expected character set:

 SELECT @@character_set_client;

 File: manual.info.tmp, Node: charset-applications, Next: charset-errors, Prev: charset-connection, Up: charset