8.8 Understanding the Query Execution Plan

Depending on the details of your tables, columns, indexes, and the conditions in your 'WHERE' clause, the MySQL optimizer considers many techniques to efficiently perform the lookups involved in an SQL query. A query on a huge table can be performed without reading all the rows; a join involving several tables can be performed without comparing every combination of rows. The set of operations that the optimizer chooses to perform the most efficient query is called the 'query execution plan', also known as the note 'EXPLAIN': explain. plan. Your goals are to recognize the aspects of the note 'EXPLAIN': explain. plan that indicate a query is optimized well, and to learn the SQL syntax and indexing techniques to improve the plan if you see some inefficient operations.

 File: manual.info.tmp, Node: using-explain, Next: explain-output, Prev: execution-plan-information, Up: execution-plan-information

8.8.1 Optimizing Queries with EXPLAIN

The *note 'EXPLAIN': explain. statement provides information about how MySQL executes statements:

With the help of note 'EXPLAIN': explain, you can see where you should add indexes to tables so that the statement executes faster by using indexes to find rows. You can also use note 'EXPLAIN': explain. to check whether the optimizer joins the tables in an optimal order. To give a hint to the optimizer to use a join order corresponding to the order in which the tables are named in a note 'SELECT': select. statement, begin the statement with 'SELECT STRAIGHT_JOIN' rather than just note 'SELECT': select. (See note select::.) However, 'STRAIGHT_JOIN' may prevent indexes from being used because it disables semijoin transformations. See note semijoins::.

The optimizer trace may sometimes provide information complementary to that of note 'EXPLAIN': explain. However, the optimizer trace format and content are subject to change between versions. For details, see note optimizer-tracing::.

If you have a problem with indexes not being used when you believe that they should be, run note 'ANALYZE TABLE': analyze-table. to update table statistics, such as cardinality of keys, that can affect the choices the optimizer makes. See note analyze-table::.

Note:

note 'EXPLAIN': explain. can also be used to obtain information about the columns in a table. note 'EXPLAIN TBL_NAME': explain. is synonymous with 'DESCRIBE TBL_NAME' and 'SHOW COLUMNS FROM TBL_NAME'. For more information, see note describe::, and note show-columns::.

 File: manual.info.tmp, Node: explain-output, Next: explain-extended, Prev: using-explain, Up: execution-plan-information

8.8.2 EXPLAIN Output Format

The note 'EXPLAIN': explain. statement provides information about how MySQL executes statements. note 'EXPLAIN': explain. works with note 'SELECT': select, note 'DELETE': delete, note 'INSERT': insert, note 'REPLACE': replace, and *note 'UPDATE': update. statements.

note 'EXPLAIN': explain. returns a row of information for each table used in the note 'SELECT': select. statement. It lists the tables in the output in the order that MySQL would read them while processing the statement. MySQL resolves all joins using a nested-loop join method. This means that MySQL reads a row from the first table, and then finds a matching row in the second table, the third table, and so on. When all tables are processed, MySQL outputs the selected columns and backtracks through the table list until a table is found for which there are more matching rows. The next row is read from this table and the process continues with the next table.

note 'EXPLAIN': explain. output includes partition information. Also, for note 'SELECT': select. statements, note 'EXPLAIN': explain. generates extended information that can be displayed with note 'SHOW WARNINGS': show-warnings. following the note 'EXPLAIN': explain. (see note explain-extended::).

Note:

In older MySQL releases, partition and extended information was produced using note 'EXPLAIN PARTITIONS': explain. and note 'EXPLAIN EXTENDED': explain. Those syntaxes are still recognized for backward compatibility but partition and extended output is now enabled by default, so the 'PARTITIONS' and 'EXTENDED' keywords are superfluous and deprecated. Their use results in a warning; expect them to be removed from *note 'EXPLAIN': explain. syntax in a future MySQL release.

You cannot use the deprecated 'PARTITIONS' and 'EXTENDED' keywords together in the same *note 'EXPLAIN': explain. statement. In addition, neither of these keywords can be used together with the 'FORMAT' option.

Note:

MySQL Workbench has a Visual Explain capability that provides a visual representation of *note 'EXPLAIN': explain. output. See Tutorial: Using Explain to Improve Query Performance (https://dev.mysql.com/doc/workbench/en/wb-tutorial-visual-explain-dbt3.html).

EXPLAIN Output Columns

This section describes the output columns produced by *note 'EXPLAIN': explain. Later sections provide additional information about the 'type' and 'Extra' columns.

Each output row from note 'EXPLAIN': explain. provides information about one table. Each row contains the values summarized in note explain-output-column-table::, and described in more detail following the table. Column names are shown in the table's first column; the second column provides the equivalent property name shown in the output when 'FORMAT=JSON' is used.

EXPLAIN Output Columns

Column JSON Name Meaning

'id'

'select_id' The 'SELECT' identifier

'select_type'

None The 'SELECT' type

'table'

'table_name' The table for the output row

'partitions'

'partitions' The matching partitions

'type'

'access_type' The join type

'possible_keys'

'possible_keys' The possible indexes to choose

'key'

'key' The index actually chosen

'key_len'

'key_length' The length of the chosen key

'ref'

'ref' The columns compared to the index

'rows'

'rows' Estimate of rows to be examined

'filtered'

'filtered' Percentage of rows filtered by table condition

'Extra'

None Additional information

Note:

JSON properties which are 'NULL' are not displayed in JSON-formatted 'EXPLAIN' output.

EXPLAIN Join Types

The 'type' column of *note 'EXPLAIN': explain. output describes how tables are joined. In JSON-formatted output, these are found as values of the 'access_type' property. The following list describes the join types, ordered from the best type to the worst:

EXPLAIN Extra Information

The 'Extra' column of *note 'EXPLAIN': explain. output contains additional information about how MySQL resolves the query. The following list explains the values that can appear in this column. Each item also indicates for JSON-formatted output which property displays the 'Extra' value. For some of these, there is a specific property. The others display as the text of the 'message' property.

If you want to make your queries as fast as possible, look out for 'Extra' column values of 'Using filesort' and 'Using temporary', or, in JSON-formatted 'EXPLAIN' output, for 'using_filesort' and 'using_temporary_table' properties equal to 'true'.

EXPLAIN Output Interpretation

You can get a good indication of how good a join is by taking the product of the values in the 'rows' column of the note 'EXPLAIN': explain. output. This should tell you roughly how many rows MySQL must examine to execute the query. If you restrict queries with the 'max_join_size' system variable, this row product also is used to determine which multiple-table note 'SELECT': select. statements to execute and which to abort. See *note server-configuration::.

The following example shows how a multiple-table join can be optimized progressively based on the information provided by *note 'EXPLAIN': explain.

Suppose that you have the note 'SELECT': select. statement shown here and that you plan to examine it using note 'EXPLAIN': explain.:

 EXPLAIN SELECT tt.TicketNumber, tt.TimeIn,
                tt.ProjectReference, tt.EstimatedShipDate,
                tt.ActualShipDate, tt.ClientID,
                tt.ServiceCodes, tt.RepetitiveID,
                tt.CurrentProcess, tt.CurrentDPPerson,
                tt.RecordVolume, tt.DPPrinted, et.COUNTRY,
                et_1.COUNTRY, do.CUSTNAME
         FROM tt, et, et AS et_1, do
         WHERE tt.SubmitTime IS NULL
           AND tt.ActualPC = et.EMPLOYID
           AND tt.AssignedPC = et_1.EMPLOYID
           AND tt.ClientID = do.CUSTNMBR;

For this example, make the following assumptions:

Initially, before any optimizations have been performed, the *note 'EXPLAIN': explain. statement produces the following information:

 table type possible_keys key  key_len ref  rows  Extra
 et    ALL  PRIMARY       NULL NULL    NULL 74
 do    ALL  PRIMARY       NULL NULL    NULL 2135
 et_1  ALL  PRIMARY       NULL NULL    NULL 74
 tt    ALL  AssignedPC,   NULL NULL    NULL 3872
            ClientID,
            ActualPC
       Range checked for each record (index map: 0x23)

Because 'type' is 'ALL' for each table, this output indicates that MySQL is generating a Cartesian product of all the tables; that is, every combination of rows. This takes quite a long time, because the product of the number of rows in each table must be examined. For the case at hand, this product is 74 x 2135 x 74 x 3872 = 45,268,558,720 rows. If the tables were bigger, you can only imagine how long it would take.

One problem here is that MySQL can use indexes on columns more efficiently if they are declared as the same type and size. In this context, note 'VARCHAR': char. and note 'CHAR': char. are considered the same if they are declared as the same size. 'tt.ActualPC' is declared as 'CHAR(10)' and 'et.EMPLOYID' is 'CHAR(15)', so there is a length mismatch.

To fix this disparity between column lengths, use *note 'ALTER TABLE': alter-table. to lengthen 'ActualPC' from 10 characters to 15 characters:

 mysql> ALTER TABLE tt MODIFY ActualPC VARCHAR(15);

Now 'tt.ActualPC' and 'et.EMPLOYID' are both 'VARCHAR(15)'. Executing the *note 'EXPLAIN': explain. statement again produces this result:

 table type   possible_keys key     key_len ref         rows    Extra
 tt    ALL    AssignedPC,   NULL    NULL    NULL        3872    Using
              ClientID,                                         where
              ActualPC
 do    ALL    PRIMARY       NULL    NULL    NULL        2135
       Range checked for each record (index map: 0x1)
 et_1  ALL    PRIMARY       NULL    NULL    NULL        74
       Range checked for each record (index map: 0x1)
 et    eq_ref PRIMARY       PRIMARY 15      tt.ActualPC 1

This is not perfect, but is much better: The product of the 'rows' values is less by a factor of 74. This version executes in a couple of seconds.

A second alteration can be made to eliminate the column length mismatches for the 'tt.AssignedPC = et_1.EMPLOYID' and 'tt.ClientID = do.CUSTNMBR' comparisons:

 mysql> ALTER TABLE tt MODIFY AssignedPC VARCHAR(15),
                       MODIFY ClientID   VARCHAR(15);

After that modification, *note 'EXPLAIN': explain. produces the output shown here:

 table type   possible_keys key      key_len ref           rows Extra
 et    ALL    PRIMARY       NULL     NULL    NULL          74
 tt    ref    AssignedPC,   ActualPC 15      et.EMPLOYID   52   Using
              ClientID,                                         where
              ActualPC
 et_1  eq_ref PRIMARY       PRIMARY  15      tt.AssignedPC 1
 do    eq_ref PRIMARY       PRIMARY  15      tt.ClientID   1

At this point, the query is optimized almost as well as possible. The remaining problem is that, by default, MySQL assumes that values in the 'tt.ActualPC' column are evenly distributed, and that is not the case for the 'tt' table. Fortunately, it is easy to tell MySQL to analyze the key distribution:

 mysql> ANALYZE TABLE tt;

With the additional index information, the join is perfect and *note 'EXPLAIN': explain. produces this result:

 table type   possible_keys key     key_len ref           rows Extra
 tt    ALL    AssignedPC    NULL    NULL    NULL          3872 Using
              ClientID,                                        where
              ActualPC
 et    eq_ref PRIMARY       PRIMARY 15      tt.ActualPC   1
 et_1  eq_ref PRIMARY       PRIMARY 15      tt.AssignedPC 1
 do    eq_ref PRIMARY       PRIMARY 15      tt.ClientID   1

The 'rows' column in the output from note 'EXPLAIN': explain. is an educated guess from the MySQL join optimizer. Check whether the numbers are even close to the truth by comparing the 'rows' product with the actual number of rows that the query returns. If the numbers are quite different, you might get better performance by using 'STRAIGHT_JOIN' in your note 'SELECT': select. statement and trying to list the tables in a different order in the 'FROM' clause. (However, 'STRAIGHT_JOIN' may prevent indexes from being used because it disables semijoin transformations. See *note semijoins::.)

It is possible in some cases to execute statements that modify data when note 'EXPLAIN SELECT': explain. is used with a subquery; for more information, see note derived-tables::.

 File: manual.info.tmp, Node: explain-extended, Next: explain-for-connection, Prev: explain-output, Up: execution-plan-information

8.8.3 Extended EXPLAIN Output Format

For note 'SELECT': select. statements, the note 'EXPLAIN': explain. statement produces extra ('extended') information that is not part of note 'EXPLAIN': explain. output but can be viewed by issuing a note 'SHOW WARNINGS': show-warnings. statement following note 'EXPLAIN': explain. The 'Message' value in note 'SHOW WARNINGS': show-warnings. output displays how the optimizer qualifies table and column names in the note 'SELECT': select. statement, what the note 'SELECT': select. looks like after the application of rewriting and optimization rules, and possibly other notes about the optimization process.

The extended information displayable with a note 'SHOW WARNINGS': show-warnings. statement following note 'EXPLAIN': explain. is produced only for note 'SELECT': select. statements. note 'SHOW WARNINGS': show-warnings. displays an empty result for other explainable statements (note 'DELETE': delete, note 'INSERT': insert, note 'REPLACE': replace, and note 'UPDATE': update.).

Note:

In older MySQL releases, extended information was produced using note 'EXPLAIN EXTENDED': explain. That syntax is still recognized for backward compatibility but extended output is now enabled by default, so the 'EXTENDED' keyword is superfluous and deprecated. Its use results in a warning; expect it to be removed from note 'EXPLAIN': explain. syntax in a future MySQL release.

Here is an example of extended *note 'EXPLAIN': explain. output:

 mysql> EXPLAIN
        SELECT t1.a, t1.a IN (SELECT t2.a FROM t2) FROM t1\G
 *************************** 1. row ***************************
            id: 1
   select_type: PRIMARY
         table: t1
          type: index
 possible_keys: NULL
           key: PRIMARY
       key_len: 4
           ref: NULL
          rows: 4
      filtered: 100.00
         Extra: Using index
 *************************** 2. row ***************************
            id: 2
   select_type: SUBQUERY
         table: t2
          type: index
 possible_keys: a
           key: a
       key_len: 5
           ref: NULL
          rows: 3
      filtered: 100.00
         Extra: Using index
 2 rows in set, 1 warning (0.00 sec)

 mysql> SHOW WARNINGS\G
 *************************** 1. row ***************************
   Level: Note
    Code: 1003
 Message: /* select#1 */ select `test`.`t1`.`a` AS `a`,
          <in_optimizer>(`test`.`t1`.`a`,`test`.`t1`.`a` in
          ( <materialize> (/* select#2 */ select `test`.`t2`.`a`
          from `test`.`t2` where 1 having 1 ),
          <primary_index_lookup>(`test`.`t1`.`a` in
          <temporary table> on <auto_key>
          where ((`test`.`t1`.`a` = `materialized-subquery`.`a`))))) AS `t1.a
          IN (SELECT t2.a FROM t2)` from `test`.`t1`
 1 row in set (0.00 sec)

Because the statement displayed by *note 'SHOW WARNINGS': show-warnings. may contain special markers to provide information about query rewriting or optimizer actions, the statement is not necessarily valid SQL and is not intended to be executed. The output may also include rows with 'Message' values that provide additional non-SQL explanatory notes about actions taken by the optimizer.

The following list describes special markers that can appear in the extended output displayed by *note 'SHOW WARNINGS': show-warnings.:

When some tables are of 'const' or 'system' type, expressions involving columns from these tables are evaluated early by the optimizer and are not part of the displayed statement. However, with 'FORMAT=JSON', some 'const' table accesses are displayed as a 'ref' access that uses a const value.

 File: manual.info.tmp, Node: explain-for-connection, Next: estimating-performance, Prev: explain-extended, Up: execution-plan-information

8.8.4 Obtaining Execution Plan Information for a Named Connection

To obtain the execution plan for an explainable statement executing in a named connection, use this statement:

 EXPLAIN [OPTIONS] FOR CONNECTION CONNECTION_ID;

note 'EXPLAIN FOR CONNECTION': explain-for-connection. returns the note 'EXPLAIN': explain. information that is currently being used to execute a query in a given connection. Because of changes to data (and supporting statistics) it may produce a different result from running note 'EXPLAIN': explain. on the equivalent query text. This difference in behavior can be useful in diagnosing more transient performance problems. For example, if you are running a statement in one session that is taking a long time to complete, using note 'EXPLAIN FOR CONNECTION': explain-for-connection. in another session may yield useful information about the cause of the delay.

CONNECTION_ID is the connection identifier, as obtained from the 'INFORMATION_SCHEMA' note 'PROCESSLIST': information-schema-processlist-table. table or the note 'SHOW PROCESSLIST': show-processlist. statement. If you have the 'PROCESS' privilege, you can specify the identifier for any connection. Otherwise, you can specify the identifier only for your own connections.

If the named connection is not executing a statement, the result is empty. Otherwise, 'EXPLAIN FOR CONNECTION' applies only if the statement being executed in the named connection is explainable. This includes note 'SELECT': select, note 'DELETE': delete, note 'INSERT': insert, note 'REPLACE': replace, and *note 'UPDATE': update. (However, 'EXPLAIN FOR CONNECTION' does not work for prepared statements, even prepared statements of those types.)

If the named connection is executing an explainable statement, the output is what you would obtain by using 'EXPLAIN' on the statement itself.

If the named connection is executing a statement that is not explainable, an error occurs. For example, you cannot name the connection identifier for your current session because 'EXPLAIN' is not explainable:

 mysql> SELECT CONNECTION_ID();
 +-----------------+
 | CONNECTION_ID() |
 +-----------------+
 |             373 |
 +-----------------+
 1 row in set (0.00 sec)

 mysql> EXPLAIN FOR CONNECTION 373;
 ERROR 1889 (HY000): EXPLAIN FOR CONNECTION command is supported
 only for SELECT/UPDATE/INSERT/DELETE/REPLACE

The 'Com_explain_other' status variable indicates the number of *note 'EXPLAIN FOR CONNECTION': explain. statements executed.

 File: manual.info.tmp, Node: estimating-performance, Prev: explain-for-connection, Up: execution-plan-information

8.8.5 Estimating Query Performance

In most cases, you can estimate query performance by counting disk seeks. For small tables, you can usually find a row in one disk seek (because the index is probably cached). For bigger tables, you can estimate that, using B-tree indexes, you need this many seeks to find a row: 'log(ROW_COUNT) / log(INDEX_BLOCK_LENGTH / 3 * 2 / (INDEX_LENGTH + DATA_POINTER_LENGTH)) + 1'.

In MySQL, an index block is usually 1,024 bytes and the data pointer is usually four bytes. For a 500,000-row table with a key value length of three bytes (the size of note 'MEDIUMINT': integer-types.), the formula indicates 'log(500,000)/log(1024/32/(3+4)) + 1' = '4' seeks.

This index would require storage of about 500,000 * 7 * 3/2 = 5.2MB (assuming a typical index buffer fill ratio of 2/3), so you probably have much of the index in memory and so need only one or two calls to read data to find the row.

For writes, however, you need four seek requests to find where to place a new index value and normally two seeks to update the index and write the row.

The preceding discussion does not mean that your application performance slowly degenerates by log N. As long as everything is cached by the OS or the MySQL server, things become only marginally slower as the table gets bigger. After the data gets too big to be cached, things start to go much slower until your applications are bound only by disk seeks (which increase by log N). To avoid this, increase the key cache size as the data grows. For 'MyISAM' tables, the key cache size is controlled by the 'key_buffer_size' system variable. See *note server-configuration::.

 File: manual.info.tmp, Node: controlling-optimizer, Next: buffering-caching, Prev: execution-plan-information, Up: optimization