25.10 Performance Schema Statement Digests

The MySQL server is capable of maintaining statement digest information. The digesting process converts each SQL statement to normalized form (the statement digest) and computes an MD5 hash value (the digest hash value) from the normalized result. Normalization permits statements that are similar to be grouped and summarized to expose information about the types of statements the server is executing and how often they occur. This section describes how statement digesting occurs and how it can be useful.

Digesting occurs in the parser regardless of whether the Performance Schema is available, so that other server components such as MySQL Enterprise Firewall and query rewrite plugins have access to statement digests.

Statement Digest General Concepts

When the parser receives an SQL statement, it computes a statement digest if that digest is needed, which is true if any of the following conditions are true:

The 'max_digest_length' system variable value determines the maximum number of bytes available per session for computation of normalized statement digests. Once that amount of space is used during digest computation, truncation occurs: no further tokens from a parsed statement are collected or figure into its digest value. Statements that differ only after that many bytes of parsed tokens produce the same normalized statement digest and are considered identical if compared or if aggregated for digest statistics.

Warning:

Setting the 'max_digest_length' system variable to zero disables digest production, which also disables server functionality that requires digests.

After the normalized statement has been computed, an MD5 hash value is computed from it. In addition:

Statement normalization transforms the statement text to a more standardized digest string representation that preserves the general statement structure while removing information not essential to the structure:

Consider these statements:

 SELECT * FROM orders WHERE customer_id=10 AND quantity>20
 SELECT * FROM orders WHERE customer_id = 20 AND quantity > 100

To normalize these statements, the parser replaces data values by '?' and adjusts whitespace. Both statements yield the same normalized form and thus are considered 'the same':

 SELECT * FROM orders WHERE customer_id = ? AND quantity > ?

The normalized statement contains less information but is still representative of the original statement. Other similar statements that have different data values have the same normalized form.

Now consider these statements:

 SELECT * FROM customers WHERE customer_id = 1000
 SELECT * FROM orders WHERE customer_id = 1000

In this case, the normalized statements differ because the object identifiers differ:

 SELECT * FROM customers WHERE customer_id = ?
 SELECT * FROM orders WHERE customer_id = ?

If normalization produces a statement that exceeds the space available in the digest buffer (as determined by 'max_digest_length'), truncation occurs and the text ends with '...'. Long normalized statements that differ only in the part that occurs following the '...' are considered the same. Consider these statements:

 SELECT * FROM mytable WHERE cola = 10 AND colb = 20
 SELECT * FROM mytable WHERE cola = 10 AND colc = 20

If the cutoff happens to be right after the 'AND', both statements have this normalized form:

 SELECT * FROM mytable WHERE cola = ? AND ...

In this case, the difference in the second column name is lost and both statements are considered the same.

Statement Digests in the Performance Schema

In the Performance Schema, statement digesting involves these elements:

The statement event tables also have an 'SQL_TEXT' column that contains the original SQL statement. The maximum space available for statement display is 1024 bytes by default. To change this value, set the 'performance_schema_max_sql_text_length' system variable at server startup.

The 'performance_schema_max_digest_length' system variable determines the maximum number of bytes available per statement for digest value storage in the Performance Schema. However, the display length of statement digests may be longer than the available buffer size due to internal encoding of statement elements such as keywords and literal values. Consequently, values selected from the 'DIGEST_TEXT' column of statement event tables may appear to exceed the 'performance_schema_max_digest_length' value.

The *note 'events_statements_summary_by_digest': performance-schema-statement-summary-tables. summary table provides a profile of the statements executed by the server. It shows what kinds of statements an application is executing and how often. An application developer can use this information together with other information in the table to assess the application's performance characteristics. For example, table columns that show wait times, lock times, or index use may highlight types of queries that are inefficient. This gives the developer insight into which parts of the application need attention.

The *note 'events_statements_summary_by_digest': performance-schema-statement-summary-tables. summary table has a fixed size. By default the Performance Schema estimates the size to use at startup. To specify the table size explicitly, set the 'performance_schema_digests_size' system variable at server startup. If the table becomes full, the Performance Schema groups statements that have 'SCHEMA_NAME' and 'DIGEST' values not matching existing values in the table in a special row with 'SCHEMA_NAME' and 'DIGEST' set to 'NULL'. This permits all statements to be counted. However, if the special row accounts for a significant percentage of the statements executed, it might be desirable to increase the summary table size by increasing 'performance_schema_digests_size'.

Statement Digest Memory Use

For applications that generate very long statements that differ only at the end, increasing 'max_digest_length' enables computation of digests that distinguish statements that would otherwise aggregate to the same digest. Conversely, decreasing 'max_digest_length' causes the server to devote less memory to digest storage but increases the likelihood of longer statements aggregating to the same digest. Administrators should keep in mind that larger values result in correspondingly increased memory requirements, particularly for workloads that involve large numbers of simultaneous sessions (the server allocates 'max_digest_length' bytes per session).

As described previously, normalized statement digests as computed by the parser are constrained to a maximum of 'max_digest_length' bytes, whereas normalized statement digests stored in the Performance Schema use 'performance_schema_max_digest_length' bytes. The following memory-use considerations apply regarding the relative values of 'max_digest_length' and 'performance_schema_max_digest_length':

Because the Performance Schema statement event tables might store many digests, setting 'performance_schema_max_digest_length' smaller than 'max_digest_length' enables administrators to balance these factors:

The 'performance_schema_max_digest_length' setting is not per session, it is per statement, and a session can store multiple statements in the *note 'events_statements_history': performance-schema-events-statements-history-table. table. A typical number of statements in this table is 10 per session, so each session consumes 10 times the memory indicated by the 'performance_schema_max_digest_length' value, for this table alone.

Also, there are many statements (and digests) collected globally, most notably in the *note 'events_statements_history_long': performance-schema-events-statements-history-long-table. table. Here, too, N statements stored consume N times the memory indicated by the 'performance_schema_max_digest_length' value.

To assess the amount of memory used for SQL statement storage and digest computation, use the *note 'SHOW ENGINE PERFORMANCE_SCHEMA STATUS': show-engine. statement, or monitor these instruments:

 mysql> SELECT NAME
        FROM performance_schema.setup_instruments
        WHERE NAME LIKE '%.sqltext';
 +------------------------------------------------------------------+
 | NAME                                                             |
 +------------------------------------------------------------------+
 | memory/performance_schema/events_statements_history.sqltext      |
 | memory/performance_schema/events_statements_current.sqltext      |
 | memory/performance_schema/events_statements_history_long.sqltext |
 +------------------------------------------------------------------+

 mysql> SELECT NAME
        FROM performance_schema.setup_instruments
        WHERE NAME LIKE 'memory/performance_schema/%.tokens';
 +----------------------------------------------------------------------+
 | NAME                                                                 |
 +----------------------------------------------------------------------+
 | memory/performance_schema/events_statements_history.tokens           |
 | memory/performance_schema/events_statements_current.tokens           |
 | memory/performance_schema/events_statements_summary_by_digest.tokens |
 | memory/performance_schema/events_statements_history_long.tokens      |
 +----------------------------------------------------------------------+

 File: manual.info.tmp, Node: performance-schema-table-characteristics, Next: performance-schema-table-descriptions, Prev: performance-schema-statement-digests, Up: performance-schema