8.2 Optimizing SQL Statements

The core logic of a database application is performed through SQL statements, whether issued directly through an interpreter or submitted behind the scenes through an API. The tuning guidelines in this section help to speed up all kinds of MySQL applications. The guidelines cover SQL operations that read and write data, the behind-the-scenes overhead for SQL operations in general, and operations used in specific scenarios such as database monitoring.

 File: manual.info.tmp, Node: select-optimization, Next: subquery-optimization, Prev: statement-optimization, Up: statement-optimization

8.2.1 Optimizing SELECT Statements

Queries, in the form of *note 'SELECT': select. statements, perform all the lookup operations in the database. Tuning these statements is a top priority, whether to achieve sub-second response times for dynamic web pages, or to chop hours off the time to generate huge overnight reports.

Besides note 'SELECT': select. statements, the tuning techniques for queries also apply to constructs such as note 'CREATE TABLE...AS SELECT': create-table-select, note 'INSERT INTO...SELECT': insert-select, and 'WHERE' clauses in note 'DELETE': delete. statements. Those statements have additional performance considerations because they combine write operations with the read-oriented query operations.

NDB Cluster supports a join pushdown optimization whereby a qualifying join is sent in its entirety to NDB Cluster data nodes, where it can be distributed among them and executed in parallel. For more information about this optimization, see *note ndb_join_pushdown-conditions::.

The main considerations for optimizing queries are:

 File: manual.info.tmp, Node: where-optimization, Next: range-optimization, Prev: select-optimization, Up: select-optimization

8.2.1.1 WHERE Clause Optimization .................................

This section discusses optimizations that can be made for processing 'WHERE' clauses. The examples use note 'SELECT': select. statements, but the same optimizations apply for 'WHERE' clauses in note 'DELETE': delete. and *note 'UPDATE': update. statements.

Note:

Because work on the MySQL optimizer is ongoing, not all of the optimizations that MySQL performs are documented here.

You might be tempted to rewrite your queries to make arithmetic operations faster, while sacrificing readability. Because MySQL does similar optimizations automatically, you can often avoid this work, and leave the query in a more understandable and maintainable form. Some of the optimizations performed by MySQL follow:

Some examples of queries that are very fast:

 SELECT COUNT(*) FROM TBL_NAME;

 SELECT MIN(KEY_PART1),MAX(KEY_PART1) FROM TBL_NAME;

 SELECT MAX(KEY_PART2) FROM TBL_NAME
   WHERE KEY_PART1=CONSTANT;

 SELECT ... FROM TBL_NAME
   ORDER BY KEY_PART1,KEY_PART2,... LIMIT 10;

 SELECT ... FROM TBL_NAME
   ORDER BY KEY_PART1 DESC, KEY_PART2 DESC, ... LIMIT 10;

MySQL resolves the following queries using only the index tree, assuming that the indexed columns are numeric:

 SELECT KEY_PART1,KEY_PART2 FROM TBL_NAME WHERE KEY_PART1=VAL;

 SELECT COUNT(*) FROM TBL_NAME
   WHERE KEY_PART1=VAL1 AND KEY_PART2=VAL2;

 SELECT MAX(KEY_PART2) FROM TBL_NAME GROUP BY KEY_PART1;

The following queries use indexing to retrieve the rows in sorted order without a separate sorting pass:

 SELECT ... FROM TBL_NAME
   ORDER BY KEY_PART1,KEY_PART2,... ;

 SELECT ... FROM TBL_NAME
   ORDER BY KEY_PART1 DESC, KEY_PART2 DESC, ... ;

 File: manual.info.tmp, Node: range-optimization, Next: index-merge-optimization, Prev: where-optimization, Up: select-optimization

8.2.1.2 Range Optimization ..........................

The 'range' access method uses a single index to retrieve a subset of table rows that are contained within one or several index value intervals. It can be used for a single-part or multiple-part index. The following sections describe conditions under which the optimizer uses range access.

Range Access Method for Single-Part Indexes

For a single-part index, index value intervals can be conveniently represented by corresponding conditions in the 'WHERE' clause, denoted as range conditions rather than 'intervals.'

The definition of a range condition for a single-part index is as follows:

'Constant value' in the preceding descriptions means one of the following:

Here are some examples of queries with range conditions in the 'WHERE' clause:

 SELECT * FROM t1
   WHERE KEY_COL > 1
   AND KEY_COL < 10;

 SELECT * FROM t1
   WHERE KEY_COL = 1
   OR KEY_COL IN (15,18,20);

 SELECT * FROM t1
   WHERE KEY_COL LIKE 'ab%'
   OR KEY_COL BETWEEN 'bar' AND 'foo';

Some nonconstant values may be converted to constants during the optimizer constant propagation phase.

MySQL tries to extract range conditions from the 'WHERE' clause for each of the possible indexes. During the extraction process, conditions that cannot be used for constructing the range condition are dropped, conditions that produce overlapping ranges are combined, and conditions that produce empty ranges are removed.

Consider the following statement, where 'key1' is an indexed column and 'nonkey' is not indexed:

 SELECT * FROM t1 WHERE
   (key1 < 'abc' AND (key1 LIKE 'abcde%' OR key1 LIKE '%b')) OR
   (key1 < 'bar' AND nonkey = 4) OR
   (key1 < 'uux' AND key1 > 'z');

The extraction process for key 'key1' is as follows:

  1. Start with original 'WHERE' clause:

      (key1 < 'abc' AND (key1 LIKE 'abcde%' OR key1 LIKE '%b')) OR
      (key1 < 'bar' AND nonkey = 4) OR
      (key1 < 'uux' AND key1 > 'z')
  2. Remove 'nonkey = 4' and 'key1 LIKE '%b'' because they cannot be used for a range scan. The correct way to remove them is to replace them with 'TRUE', so that we do not miss any matching rows when doing the range scan. Replacing them with 'TRUE' yields:

      (key1 < 'abc' AND (key1 LIKE 'abcde%' OR TRUE)) OR
      (key1 < 'bar' AND TRUE) OR
      (key1 < 'uux' AND key1 > 'z')
  3. Collapse conditions that are always true or false:

    * '(key1 LIKE 'abcde%' OR TRUE)' is always true
    
    * '(key1 < 'uux' AND key1 > 'z')' is always false

    Replacing these conditions with constants yields:

      (key1 < 'abc' AND TRUE) OR (key1 < 'bar' AND TRUE) OR (FALSE)

    Removing unnecessary 'TRUE' and 'FALSE' constants yields:

      (key1 < 'abc') OR (key1 < 'bar')
  4. Combining overlapping intervals into one yields the final condition to be used for the range scan:

      (key1 < 'bar')

In general (and as demonstrated by the preceding example), the condition used for a range scan is less restrictive than the 'WHERE' clause. MySQL performs an additional check to filter out rows that satisfy the range condition but not the full 'WHERE' clause.

The range condition extraction algorithm can handle nested 'AND'/'OR' constructs of arbitrary depth, and its output does not depend on the order in which conditions appear in 'WHERE' clause.

MySQL does not support merging multiple ranges for the 'range' access method for spatial indexes. To work around this limitation, you can use a note 'UNION': union. with identical note 'SELECT': select. statements, except that you put each spatial predicate in a different *note 'SELECT': select.

Range Access Method for Multiple-Part Indexes

Range conditions on a multiple-part index are an extension of range conditions for a single-part index. A range condition on a multiple-part index restricts index rows to lie within one or several key tuple intervals. Key tuple intervals are defined over a set of key tuples, using ordering from the index.

For example, consider a multiple-part index defined as 'key1(KEY_PART1, KEY_PART2, KEY_PART3)', and the following set of key tuples listed in key order:

 KEY_PART1  KEY_PART2  KEY_PART3
   NULL       1          'abc'
   NULL       1          'xyz'
   NULL       2          'foo'
    1         1          'abc'
    1         1          'xyz'
    1         2          'abc'
    2         1          'aaa'

The condition 'KEY_PART1 = 1' defines this interval:

 (1,-inf,-inf) <= (KEY_PART1,KEY_PART2,KEY_PART3) < (1,+inf,+inf)

The interval covers the 4th, 5th, and 6th tuples in the preceding data set and can be used by the range access method.

By contrast, the condition 'KEY_PART3 = 'abc'' does not define a single interval and cannot be used by the range access method.

The following descriptions indicate how range conditions work for multiple-part indexes in greater detail.

For a description of how optimizations are performed to combine or eliminate intervals for range conditions on a single-part index, see *note range-access-single-part::. Analogous steps are performed for range conditions on multiple-part indexes.

Equality Range Optimization of Many-Valued Comparisons

Consider these expressions, where COL_NAME is an indexed column:

 COL_NAME IN(VAL1, ..., VALN)
 COL_NAME = VAL1 OR ... OR COL_NAME = VALN

Each expression is true if COL_NAME is equal to any of several values. These comparisons are equality range comparisons (where the 'range' is a single value). The optimizer estimates the cost of reading qualifying rows for equality range comparisons as follows:

With index dives, the optimizer makes a dive at each end of a range and uses the number of rows in the range as the estimate. For example, the expression 'COL_NAME IN (10, 20, 30)' has three equality ranges and the optimizer makes two dives per range to generate a row estimate. Each pair of dives yields an estimate of the number of rows that have the given value.

Index dives provide accurate row estimates, but as the number of comparison values in the expression increases, the optimizer takes longer to generate a row estimate. Use of index statistics is less accurate than index dives but permits faster row estimation for large value lists.

The 'eq_range_index_dive_limit' system variable enables you to configure the number of values at which the optimizer switches from one row estimation strategy to the other. To permit use of index dives for comparisons of up to N equality ranges, set 'eq_range_index_dive_limit' to N + 1. To disable use of statistics and always use index dives regardless of N, set 'eq_range_index_dive_limit' to 0.

To update table index statistics for best estimates, use *note 'ANALYZE TABLE': analyze-table.

Even under conditions when index dives would otherwise be used, they are skipped for queries that satisfy all these conditions:

Those dive-skipping conditions apply only for single-table queries. Index dives are not skipped for multiple-table queries (joins).

Range Optimization of Row Constructor Expressions

The optimizer is able to apply the range scan access method to queries of this form:

 SELECT ... FROM t1 WHERE ( col_1, col_2 ) IN (( 'a', 'b' ), ( 'c', 'd' ));

Previously, for range scans to be used, it was necessary to write the query as:

 SELECT ... FROM t1 WHERE ( col_1 = 'a' AND col_2 = 'b' )
 OR ( col_1 = 'c' AND col_2 = 'd' );

For the optimizer to use a range scan, queries must satisfy these conditions:

For more information about the optimizer and row constructors, see *note row-constructor-optimization::

Limiting Memory Use for Range Optimization

To control the memory available to the range optimizer, use the 'range_optimizer_max_mem_size' system variable:

For individual queries that exceed the available range optimization memory and for which the optimizer falls back to less optimal plans, increasing the 'range_optimizer_max_mem_size' value may improve performance.

To estimate the amount of memory needed to process a range expression, use these guidelines:

Before 5.7.11, the number of bytes per predicate combined with 'OR' was higher, approximately 700 bytes.

 File: manual.info.tmp, Node: index-merge-optimization, Next: engine-condition-pushdown-optimization, Prev: range-optimization, Up: select-optimization

8.2.1.3 Index Merge Optimization ................................

The Index Merge access method retrieves rows with multiple 'range' scans and merges their results into one. This access method merges index scans from a single table only, not scans across multiple tables. The merge can produce unions, intersections, or unions-of-intersections of its underlying scans.

Example queries for which Index Merge may be used:

 SELECT * FROM TBL_NAME WHERE KEY1 = 10 OR KEY2 = 20;

 SELECT * FROM TBL_NAME
   WHERE (KEY1 = 10 OR KEY2 = 20) AND NON_KEY = 30;

 SELECT * FROM t1, t2
   WHERE (t1.KEY1 IN (1,2) OR t1.KEY2 LIKE 'VALUE%')
   AND t2.KEY1 = t1.SOME_COL;

 SELECT * FROM t1, t2
   WHERE t1.KEY1 = 1
   AND (t2.KEY1 = t1.SOME_COL OR t2.KEY2 = t1.SOME_COL2);

Note:

The Index Merge optimization algorithm has the following known limitations:

In *note 'EXPLAIN': explain. output, the Index Merge method appears as 'index_merge' in the 'type' column. In this case, the 'key' column contains a list of indexes used, and 'key_len' contains a list of the longest key parts for those indexes.

The Index Merge access method has several algorithms, which are displayed in the 'Extra' field of *note 'EXPLAIN': explain. output:

The following sections describe these algorithms in greater detail. The optimizer chooses between different possible Index Merge algorithms and other access methods based on cost estimates of the various available options.

Use of Index Merge is subject to the value of the 'index_merge', 'index_merge_intersection', 'index_merge_union', and 'index_merge_sort_union' flags of the 'optimizer_switch' system variable. See *note switchable-optimizations::. By default, all those flags are 'on'. To enable only certain algorithms, set 'index_merge' to 'off', and enable only such of the others as should be permitted.

Index Merge Intersection Access Algorithm

This access algorithm is applicable when a 'WHERE' clause is converted to several range conditions on different keys combined with 'AND', and each condition is one of the following:

Examples:

 SELECT * FROM INNODB_TABLE
   WHERE PRIMARY_KEY < 10 AND KEY_COL1 = 20;

 SELECT * FROM TBL_NAME
   WHERE KEY1_PART1 = 1 AND KEY1_PART2 = 2 AND KEY2 = 2;

The Index Merge intersection algorithm performs simultaneous scans on all used indexes and produces the intersection of row sequences that it receives from the merged index scans.

If all columns used in the query are covered by the used indexes, full table rows are not retrieved (*note 'EXPLAIN': explain. output contains 'Using index' in 'Extra' field in this case). Here is an example of such a query:

 SELECT COUNT(*) FROM t1 WHERE key1 = 1 AND key2 = 1;

If the used indexes do not cover all columns used in the query, full rows are retrieved only when the range conditions for all used keys are satisfied.

If one of the merged conditions is a condition over the primary key of an 'InnoDB' table, it is not used for row retrieval, but is used to filter out rows retrieved using other conditions.

Index Merge Union Access Algorithm

The criteria for this algorithm are similar to those for the Index Merge intersection algorithm. The algorithm is applicable when the table's 'WHERE' clause is converted to several range conditions on different keys combined with 'OR', and each condition is one of the following:

Examples:

 SELECT * FROM t1
   WHERE KEY1 = 1 OR KEY2 = 2 OR KEY3 = 3;

 SELECT * FROM INNODB_TABLE
   WHERE (KEY1 = 1 AND KEY2 = 2)
      OR (KEY3 = 'foo' AND KEY4 = 'bar') AND KEY5 = 5;

Index Merge Sort-Union Access Algorithm

This access algorithm is applicable when the 'WHERE' clause is converted to several range conditions combined by 'OR', but the Index Merge union algorithm is not applicable.

Examples:

 SELECT * FROM TBL_NAME
   WHERE KEY_COL1 < 10 OR KEY_COL2 < 20;

 SELECT * FROM TBL_NAME
   WHERE (KEY_COL1 > 10 OR KEY_COL2 = 20) AND NONKEY_COL = 30;

The difference between the sort-union algorithm and the union algorithm is that the sort-union algorithm must first fetch row IDs for all rows and sort them before returning any rows.

 File: manual.info.tmp, Node: engine-condition-pushdown-optimization, Next: index-condition-pushdown-optimization, Prev: index-merge-optimization, Up: select-optimization

8.2.1.4 Engine Condition Pushdown Optimization ..............................................

This optimization improves the efficiency of direct comparisons between a nonindexed column and a constant. In such cases, the condition is 'pushed down' to the storage engine for evaluation. This optimization can be used only by the *note 'NDB': mysql-cluster. storage engine.

For NDB Cluster, this optimization can eliminate the need to send nonmatching rows over the network between the cluster's data nodes and the MySQL server that issued the query, and can speed up queries where it is used by a factor of 5 to 10 times over cases where condition pushdown could be but is not used.

Suppose that an NDB Cluster table is defined as follows:

 CREATE TABLE t1 (
     a INT,
     b INT,
     KEY(a)
 ) ENGINE=NDB;

Engine condition pushdown can be used with queries such as the one shown here, which includes a comparison between a nonindexed column and a constant:

 SELECT a, b FROM t1 WHERE b = 10;

The use of engine condition pushdown can be seen in the output of *note 'EXPLAIN': explain.:

 mysql> EXPLAIN SELECT a,b FROM t1 WHERE b = 10\G
 *************************** 1. row ***************************
            id: 1
   select_type: SIMPLE
         table: t1
          type: ALL
 possible_keys: NULL
           key: NULL
       key_len: NULL
           ref: NULL
          rows: 10
         Extra: Using where with pushed condition

However, engine condition pushdown cannot be used with either of these two queries:

 SELECT a,b FROM t1 WHERE a = 10;
 SELECT a,b FROM t1 WHERE b + 1 = 10;

Engine condition pushdown is not applicable to the first query because an index exists on column 'a'. (An index access method would be more efficient and so would be chosen in preference to condition pushdown.) Engine condition pushdown cannot be employed for the second query because the comparison involving the nonindexed column 'b' is indirect. (However, engine condition pushdown could be applied if you were to reduce 'b + 1 = 10' to 'b = 9' in the 'WHERE' clause.)

Engine condition pushdown may also be employed when an indexed column is compared with a constant using a '>' or '<' operator:

 mysql> EXPLAIN SELECT a, b FROM t1 WHERE a < 2\G
 *************************** 1. row ***************************
            id: 1
   select_type: SIMPLE
         table: t1
          type: range
 possible_keys: a
           key: a
       key_len: 5
           ref: NULL
          rows: 2
         Extra: Using where with pushed condition

Other supported comparisons for engine condition pushdown include the following:

In all of the cases in the preceding list, it is possible for the condition to be converted into the form of one or more direct comparisons between a column and a constant.

Engine condition pushdown is enabled by default. To disable it at server startup, set the 'optimizer_switch' system variable's 'engine_condition_pushdown' flag to 'off'. For example, in a 'my.cnf' file, use these lines:

 [mysqld]
 optimizer_switch=engine_condition_pushdown=off

At runtime, disable condition pushdown like this:

 SET optimizer_switch='engine_condition_pushdown=off';

Limitations

Engine condition pushdown is subject to the following limitations:

 File: manual.info.tmp, Node: index-condition-pushdown-optimization, Next: nested-loop-joins, Prev: engine-condition-pushdown-optimization, Up: select-optimization

8.2.1.5 Index Condition Pushdown Optimization .............................................

Index Condition Pushdown (ICP) is an optimization for the case where MySQL retrieves rows from a table using an index. Without ICP, the storage engine traverses the index to locate rows in the base table and returns them to the MySQL server which evaluates the 'WHERE' condition for the rows. With ICP enabled, and if parts of the 'WHERE' condition can be evaluated by using only columns from the index, the MySQL server pushes this part of the 'WHERE' condition down to the storage engine. The storage engine then evaluates the pushed index condition by using the index entry and only if this is satisfied is the row read from the table. ICP can reduce the number of times the storage engine must access the base table and the number of times the MySQL server must access the storage engine.

Applicability of the Index Condition Pushdown optimization is subject to these conditions:

To understand how this optimization works, first consider how an index scan proceeds when Index Condition Pushdown is not used:

  1. Get the next row, first by reading the index tuple, and then by using the index tuple to locate and read the full table row.

  2. Test the part of the 'WHERE' condition that applies to this table. Accept or reject the row based on the test result.

Using Index Condition Pushdown, the scan proceeds like this instead:

  1. Get the next row's index tuple (but not the full table row).

  2. Test the part of the 'WHERE' condition that applies to this table and can be checked using only index columns. If the condition is not satisfied, proceed to the index tuple for the next row.

  3. If the condition is satisfied, use the index tuple to locate and read the full table row.

  4. Test the remaining part of the 'WHERE' condition that applies to this table. Accept or reject the row based on the test result.

*note 'EXPLAIN': explain. output shows 'Using index condition' in the 'Extra' column when Index Condition Pushdown is used. It does not show 'Using index' because that does not apply when full table rows must be read.

Suppose that a table contains information about people and their addresses and that the table has an index defined as 'INDEX (zipcode, lastname, firstname)'. If we know a person's 'zipcode' value but are not sure about the last name, we can search like this:

 SELECT * FROM people
   WHERE zipcode='95054'
   AND lastname LIKE '%etrunia%'
   AND address LIKE '%Main Street%';

MySQL can use the index to scan through people with 'zipcode='95054''. The second part ('lastname LIKE '%etrunia%'') cannot be used to limit the number of rows that must be scanned, so without Index Condition Pushdown, this query must retrieve full table rows for all people who have 'zipcode='95054''.

With Index Condition Pushdown, MySQL checks the 'lastname LIKE '%etrunia%'' part before reading the full table row. This avoids reading full rows corresponding to index tuples that match the 'zipcode' condition but not the 'lastname' condition.

Index Condition Pushdown is enabled by default. It can be controlled with the 'optimizer_switch' system variable by setting the 'index_condition_pushdown' flag:

 SET optimizer_switch = 'index_condition_pushdown=off';
 SET optimizer_switch = 'index_condition_pushdown=on';

See *note switchable-optimizations::.

 File: manual.info.tmp, Node: nested-loop-joins, Next: nested-join-optimization, Prev: index-condition-pushdown-optimization, Up: select-optimization

8.2.1.6 Nested-Loop Join Algorithms ...................................

MySQL executes joins between tables using a nested-loop algorithm or variations on it.

Nested-Loop Join Algorithm

A simple nested-loop join (NLJ) algorithm reads rows from the first table in a loop one at a time, passing each row to a nested loop that processes the next table in the join. This process is repeated as many times as there remain tables to be joined.

Assume that a join between three tables 't1', 't2', and 't3' is to be executed using the following join types:

 Table   Join Type
 t1      range
 t2      ref
 t3      ALL

If a simple NLJ algorithm is used, the join is processed like this:

 for each row in t1 matching range {
   for each row in t2 matching reference key {
     for each row in t3 {
       if row satisfies join conditions, send to client
     }
   }
 }

Because the NLJ algorithm passes rows one at a time from outer loops to inner loops, it typically reads tables processed in the inner loops many times.

Block Nested-Loop Join Algorithm

A Block Nested-Loop (BNL) join algorithm uses buffering of rows read in outer loops to reduce the number of times that tables in inner loops must be read. For example, if 10 rows are read into a buffer and the buffer is passed to the next inner loop, each row read in the inner loop can be compared against all 10 rows in the buffer. This reduces by an order of magnitude the number of times the inner table must be read.

MySQL join buffering has these characteristics:

For the example join described previously for the NLJ algorithm (without buffering), the join is done as follows using join buffering:

 for each row in t1 matching range {
   for each row in t2 matching reference key {
     store used columns from t1, t2 in join buffer
     if buffer is full {
       for each row in t3 {
         for each t1, t2 combination in join buffer {
           if row satisfies join conditions, send to client
         }
       }
       empty join buffer
     }
   }
 }

 if buffer is not empty {
   for each row in t3 {
     for each t1, t2 combination in join buffer {
       if row satisfies join conditions, send to client
     }
   }
 }

If S is the size of each stored 't1', 't2' combination in the join buffer and C is the number of combinations in the buffer, the number of times table 't3' is scanned is:

 (S * C)/join_buffer_size + 1

The number of 't3' scans decreases as the value of 'join_buffer_size' increases, up to the point when 'join_buffer_size' is large enough to hold all previous row combinations. At that point, no speed is gained by making it larger.

 File: manual.info.tmp, Node: nested-join-optimization, Next: outer-join-optimization, Prev: nested-loop-joins, Up: select-optimization

8.2.1.7 Nested Join Optimization ................................

The syntax for expressing joins permits nested joins. The following discussion refers to the join syntax described in *note join::.

The syntax of TABLE_FACTOR is extended in comparison with the SQL Standard. The latter accepts only TABLE_REFERENCE, not a list of them inside a pair of parentheses. This is a conservative extension if we consider each comma in a list of TABLE_REFERENCE items as equivalent to an inner join. For example:

 SELECT * FROM t1 LEFT JOIN (t2, t3, t4)
                  ON (t2.a=t1.a AND t3.b=t1.b AND t4.c=t1.c)

Is equivalent to:

 SELECT * FROM t1 LEFT JOIN (t2 CROSS JOIN t3 CROSS JOIN t4)
                  ON (t2.a=t1.a AND t3.b=t1.b AND t4.c=t1.c)

In MySQL, 'CROSS JOIN' is syntactically equivalent to 'INNER JOIN'; they can replace each other. In standard SQL, they are not equivalent. 'INNER JOIN' is used with an 'ON' clause; 'CROSS JOIN' is used otherwise.

In general, parentheses can be ignored in join expressions containing only inner join operations. Consider this join expression:

 t1 LEFT JOIN (t2 LEFT JOIN t3 ON t2.b=t3.b OR t2.b IS NULL)
    ON t1.a=t2.a

After removing parentheses and grouping operations to the left, that join expression transforms into this expression:

 (t1 LEFT JOIN t2 ON t1.a=t2.a) LEFT JOIN t3
     ON t2.b=t3.b OR t2.b IS NULL

Yet, the two expressions are not equivalent. To see this, suppose that the tables 't1', 't2', and 't3' have the following state:

In this case, the first expression returns a result set including the rows '(1,1,101,101)', '(2,NULL,NULL,NULL)', whereas the second expression returns the rows '(1,1,101,101)', '(2,NULL,NULL,101)':

 mysql> SELECT *
        FROM t1
             LEFT JOIN
             (t2 LEFT JOIN t3 ON t2.b=t3.b OR t2.b IS NULL)
             ON t1.a=t2.a;
 +------+------+------+------+
 | a    | a    | b    | b    |
 +------+------+------+------+
 |    1 |    1 |  101 |  101 |
 |    2 | NULL | NULL | NULL |
 +------+------+------+------+

 mysql> SELECT *
        FROM (t1 LEFT JOIN t2 ON t1.a=t2.a)
             LEFT JOIN t3
             ON t2.b=t3.b OR t2.b IS NULL;
 +------+------+------+------+
 | a    | a    | b    | b    |
 +------+------+------+------+
 |    1 |    1 |  101 |  101 |
 |    2 | NULL | NULL |  101 |
 +------+------+------+------+

In the following example, an outer join operation is used together with an inner join operation:

 t1 LEFT JOIN (t2, t3) ON t1.a=t2.a

That expression cannot be transformed into the following expression:

 t1 LEFT JOIN t2 ON t1.a=t2.a, t3

For the given table states, the two expressions return different sets of rows:

 mysql> SELECT *
        FROM t1 LEFT JOIN (t2, t3) ON t1.a=t2.a;
 +------+------+------+------+
 | a    | a    | b    | b    |
 +------+------+------+------+
 |    1 |    1 |  101 |  101 |
 |    2 | NULL | NULL | NULL |
 +------+------+------+------+

 mysql> SELECT *
        FROM t1 LEFT JOIN t2 ON t1.a=t2.a, t3;
 +------+------+------+------+
 | a    | a    | b    | b    |
 +------+------+------+------+
 |    1 |    1 |  101 |  101 |
 |    2 | NULL | NULL |  101 |
 +------+------+------+------+

Therefore, if we omit parentheses in a join expression with outer join operators, we might change the result set for the original expression.

More exactly, we cannot ignore parentheses in the right operand of the left outer join operation and in the left operand of a right join operation. In other words, we cannot ignore parentheses for the inner table expressions of outer join operations. Parentheses for the other operand (operand for the outer table) can be ignored.

The following expression:

 (t1,t2) LEFT JOIN t3 ON P(t2.b,t3.b)

Is equivalent to this expression for any tables 't1,t2,t3' and any condition 'P' over attributes 't2.b' and 't3.b':

 t1, t2 LEFT JOIN t3 ON P(t2.b,t3.b)

Whenever the order of execution of join operations in a join expression (JOINED_TABLE) is not from left to right, we talk about nested joins. Consider the following queries:

 SELECT * FROM t1 LEFT JOIN (t2 LEFT JOIN t3 ON t2.b=t3.b) ON t1.a=t2.a
   WHERE t1.a > 1

 SELECT * FROM t1 LEFT JOIN (t2, t3) ON t1.a=t2.a
   WHERE (t2.b=t3.b OR t2.b IS NULL) AND t1.a > 1

Those queries are considered to contain these nested joins:

 t2 LEFT JOIN t3 ON t2.b=t3.b
 t2, t3

In the first query, the nested join is formed with a left join operation. In the second query, it is formed with an inner join operation.

In the first query, the parentheses can be omitted: The grammatical structure of the join expression dictates the same order of execution for join operations. For the second query, the parentheses cannot be omitted, although the join expression here can be interpreted unambiguously without them. In our extended syntax, the parentheses in '(t2, t3)' of the second query are required, although theoretically the query could be parsed without them: We still would have unambiguous syntactical structure for the query because 'LEFT JOIN' and 'ON' play the role of the left and right delimiters for the expression '(t2,t3)'.

The preceding examples demonstrate these points:

Queries with nested outer joins are executed in the same pipeline manner as queries with inner joins. More exactly, a variation of the nested-loop join algorithm is exploited. Recall the algorithm by which the nested-loop join executes a query (see *note nested-loop-joins::). Suppose that a join query over 3 tables 'T1,T2,T3' has this form:

 SELECT * FROM T1 INNER JOIN T2 ON P1(T1,T2)
                  INNER JOIN T3 ON P2(T2,T3)
   WHERE P(T1,T2,T3)

Here, 'P1(T1,T2)' and 'P2(T3,T3)' are some join conditions (on expressions), whereas 'P(T1,T2,T3)' is a condition over columns of tables 'T1,T2,T3'.

The nested-loop join algorithm would execute this query in the following manner:

 FOR each row t1 in T1 {
   FOR each row t2 in T2 such that P1(t1,t2) {
     FOR each row t3 in T3 such that P2(t2,t3) {
       IF P(t1,t2,t3) {
          t:=t1||t2||t3; OUTPUT t;
       }
     }
   }
 }

The notation 't1||t2||t3' indicates a row constructed by concatenating the columns of rows 't1', 't2', and 't3'. In some of the following examples, 'NULL' where a table name appears means a row in which 'NULL' is used for each column of that table. For example, 't1||t2||NULL' indicates a row constructed by concatenating the columns of rows 't1' and 't2', and 'NULL' for each column of 't3'. Such a row is said to be 'NULL'-complemented.

Now consider a query with nested outer joins:

 SELECT * FROM T1 LEFT JOIN
               (T2 LEFT JOIN T3 ON P2(T2,T3))
               ON P1(T1,T2)
   WHERE P(T1,T2,T3)

For this query, modify the nested-loop pattern to obtain:

 FOR each row t1 in T1 {
   BOOL f1:=FALSE;
   FOR each row t2 in T2 such that P1(t1,t2) {
     BOOL f2:=FALSE;
     FOR each row t3 in T3 such that P2(t2,t3) {
       IF P(t1,t2,t3) {
         t:=t1||t2||t3; OUTPUT t;
       }
       f2=TRUE;
       f1=TRUE;
     }
     IF (!f2) {
       IF P(t1,t2,NULL) {
         t:=t1||t2||NULL; OUTPUT t;
       }
       f1=TRUE;
     }
   }
   IF (!f1) {
     IF P(t1,NULL,NULL) {
       t:=t1||NULL||NULL; OUTPUT t;
     }
   }
 }

In general, for any nested loop for the first inner table in an outer join operation, a flag is introduced that is turned off before the loop and is checked after the loop. The flag is turned on when for the current row from the outer table a match from the table representing the inner operand is found. If at the end of the loop cycle the flag is still off, no match has been found for the current row of the outer table. In this case, the row is complemented by 'NULL' values for the columns of the inner tables. The result row is passed to the final check for the output or into the next nested loop, but only if the row satisfies the join condition of all embedded outer joins.

In the example, the outer join table expressed by the following expression is embedded:

 (T2 LEFT JOIN T3 ON P2(T2,T3))

For the query with inner joins, the optimizer could choose a different order of nested loops, such as this one:

 FOR each row t3 in T3 {
   FOR each row t2 in T2 such that P2(t2,t3) {
     FOR each row t1 in T1 such that P1(t1,t2) {
       IF P(t1,t2,t3) {
          t:=t1||t2||t3; OUTPUT t;
       }
     }
   }
 }

For queries with outer joins, the optimizer can choose only such an order where loops for outer tables precede loops for inner tables. Thus, for our query with outer joins, only one nesting order is possible. For the following query, the optimizer evaluates two different nestings. In both nestings, 'T1' must be processed in the outer loop because it is used in an outer join. 'T2' and 'T3' are used in an inner join, so that join must be processed in the inner loop. However, because the join is an inner join, 'T2' and 'T3' can be processed in either order.

 SELECT * T1 LEFT JOIN (T2,T3) ON P1(T1,T2) AND P2(T1,T3)
   WHERE P(T1,T2,T3)

One nesting evaluates 'T2', then 'T3':

 FOR each row t1 in T1 {
   BOOL f1:=FALSE;
   FOR each row t2 in T2 such that P1(t1,t2) {
     FOR each row t3 in T3 such that P2(t1,t3) {
       IF P(t1,t2,t3) {
         t:=t1||t2||t3; OUTPUT t;
       }
       f1:=TRUE
     }
   }
   IF (!f1) {
     IF P(t1,NULL,NULL) {
       t:=t1||NULL||NULL; OUTPUT t;
     }
   }
 }

The other nesting evaluates 'T3', then 'T2':

 FOR each row t1 in T1 {
   BOOL f1:=FALSE;
   FOR each row t3 in T3 such that P2(t1,t3) {
     FOR each row t2 in T2 such that P1(t1,t2) {
       IF P(t1,t2,t3) {
         t:=t1||t2||t3; OUTPUT t;
       }
       f1:=TRUE
     }
   }
   IF (!f1) {
     IF P(t1,NULL,NULL) {
       t:=t1||NULL||NULL; OUTPUT t;
     }
   }
 }

When discussing the nested-loop algorithm for inner joins, we omitted some details whose impact on the performance of query execution may be huge. We did not mention so-called 'pushed-down' conditions. Suppose that our 'WHERE' condition 'P(T1,T2,T3)' can be represented by a conjunctive formula:

 P(T1,T2,T2) = C1(T1) AND C2(T2) AND C3(T3).

In this case, MySQL actually uses the following nested-loop algorithm for the execution of the query with inner joins:

 FOR each row t1 in T1 such that C1(t1) {
   FOR each row t2 in T2 such that P1(t1,t2) AND C2(t2)  {
     FOR each row t3 in T3 such that P2(t2,t3) AND C3(t3) {
       IF P(t1,t2,t3) {
          t:=t1||t2||t3; OUTPUT t;
       }
     }
   }
 }

You see that each of the conjuncts 'C1(T1)', 'C2(T2)', 'C3(T3)' are pushed out of the most inner loop to the most outer loop where it can be evaluated. If 'C1(T1)' is a very restrictive condition, this condition pushdown may greatly reduce the number of rows from table 'T1' passed to the inner loops. As a result, the execution time for the query may improve immensely.

For a query with outer joins, the 'WHERE' condition is to be checked only after it has been found that the current row from the outer table has a match in the inner tables. Thus, the optimization of pushing conditions out of the inner nested loops cannot be applied directly to queries with outer joins. Here we must introduce conditional pushed-down predicates guarded by the flags that are turned on when a match has been encountered.

Recall this example with outer joins:

 P(T1,T2,T3)=C1(T1) AND C(T2) AND C3(T3)

For that example, the nested-loop algorithm using guarded pushed-down conditions looks like this:

 FOR each row t1 in T1 such that C1(t1) {
   BOOL f1:=FALSE;
   FOR each row t2 in T2
       such that P1(t1,t2) AND (f1?C2(t2):TRUE) {
     BOOL f2:=FALSE;
     FOR each row t3 in T3
         such that P2(t2,t3) AND (f1&&f2?C3(t3):TRUE) {
       IF (f1&&f2?TRUE:(C2(t2) AND C3(t3))) {
         t:=t1||t2||t3; OUTPUT t;
       }
       f2=TRUE;
       f1=TRUE;
     }
     IF (!f2) {
       IF (f1?TRUE:C2(t2) && P(t1,t2,NULL)) {
         t:=t1||t2||NULL; OUTPUT t;
       }
       f1=TRUE;
     }
   }
   IF (!f1 && P(t1,NULL,NULL)) {
       t:=t1||NULL||NULL; OUTPUT t;
   }
 }

In general, pushed-down predicates can be extracted from join conditions such as 'P1(T1,T2)' and 'P(T2,T3)'. In this case, a pushed-down predicate is guarded also by a flag that prevents checking the predicate for the 'NULL'-complemented row generated by the corresponding outer join operation.

Access by key from one inner table to another in the same nested join is prohibited if it is induced by a predicate from the 'WHERE' condition.

 File: manual.info.tmp, Node: outer-join-optimization, Next: outer-join-simplification, Prev: nested-join-optimization, Up: select-optimization

8.2.1.8 Outer Join Optimization ...............................

Outer joins include 'LEFT JOIN' and 'RIGHT JOIN'.

MySQL implements an 'A LEFT JOIN B JOIN_SPECIFICATION' as follows:

The 'RIGHT JOIN' implementation is analogous to that of 'LEFT JOIN' with the table roles reversed. Right joins are converted to equivalent left joins, as described in *note outer-join-simplification::.

For a 'LEFT JOIN', if the 'WHERE' condition is always false for the generated 'NULL' row, the 'LEFT JOIN' is changed to an inner join. For example, the 'WHERE' clause would be false in the following query if 't2.column1' were 'NULL':

 SELECT * FROM t1 LEFT JOIN t2 ON (column1) WHERE t2.column2=5;

Therefore, it is safe to convert the query to an inner join:

 SELECT * FROM t1, t2 WHERE t2.column2=5 AND t1.column1=t2.column1;

Now the optimizer can use table 't2' before table 't1' if doing so would result in a better query plan. To provide a hint about the table join order, use 'STRAIGHT_JOIN'; see note select::. However, 'STRAIGHT_JOIN' may prevent indexes from being used because it disables semijoin transformations; see note semijoins::.

 File: manual.info.tmp, Node: outer-join-simplification, Next: mrr-optimization, Prev: outer-join-optimization, Up: select-optimization

8.2.1.9 Outer Join Simplification .................................

Table expressions in the 'FROM' clause of a query are simplified in many cases.

At the parser stage, queries with right outer join operations are converted to equivalent queries containing only left join operations. In the general case, the conversion is performed such that this right join:

 (T1, ...) RIGHT JOIN (T2, ...) ON P(T1, ..., T2, ...)

Becomes this equivalent left join:

 (T2, ...) LEFT JOIN (T1, ...) ON P(T1, ..., T2, ...)

All inner join expressions of the form 'T1 INNER JOIN T2 ON P(T1,T2)' are replaced by the list 'T1,T2', 'P(T1,T2)' being joined as a conjunct to the 'WHERE' condition (or to the join condition of the embedding join, if there is any).

When the optimizer evaluates plans for outer join operations, it takes into consideration only plans where, for each such operation, the outer tables are accessed before the inner tables. The optimizer choices are limited because only such plans enable outer joins to be executed using the nested-loop algorithm.

Consider a query of this form, where 'R(T2)' greatly narrows the number of matching rows from table 'T2':

 SELECT * T1 FROM T1
   LEFT JOIN T2 ON P1(T1,T2)
   WHERE P(T1,T2) AND R(T2)

If the query is executed as written, the optimizer has no choice but to access the less-restricted table 'T1' before the more-restricted table 'T2', which may produce a very inefficient execution plan.

Instead, MySQL converts the query to a query with no outer join operation if the 'WHERE' condition is null-rejected. (That is, it converts the outer join to an inner join.) A condition is said to be null-rejected for an outer join operation if it evaluates to 'FALSE' or 'UNKNOWN' for any 'NULL'-complemented row generated for the operation.

Thus, for this outer join:

 T1 LEFT JOIN T2 ON T1.A=T2.A

Conditions such as these are null-rejected because they cannot be true for any 'NULL'-complemented row (with 'T2' columns set to 'NULL'):

 T2.B IS NOT NULL
 T2.B > 3
 T2.C <= T1.C
 T2.B < 2 OR T2.C > 1

Conditions such as these are not null-rejected because they might be true for a 'NULL'-complemented row:

 T2.B IS NULL
 T1.B < 3 OR T2.B IS NOT NULL
 T1.B < 3 OR T2.B > 3

The general rules for checking whether a condition is null-rejected for an outer join operation are simple:

A condition can be null-rejected for one outer join operation in a query and not null-rejected for another. In this query, the 'WHERE' condition is null-rejected for the second outer join operation but is not null-rejected for the first one:

 SELECT * FROM T1 LEFT JOIN T2 ON T2.A=T1.A
                  LEFT JOIN T3 ON T3.B=T1.B
   WHERE T3.C > 0

If the 'WHERE' condition is null-rejected for an outer join operation in a query, the outer join operation is replaced by an inner join operation.

For example, in the preceding query, the second outer join is null-rejected and can be replaced by an inner join:

 SELECT * FROM T1 LEFT JOIN T2 ON T2.A=T1.A
                  INNER JOIN T3 ON T3.B=T1.B
   WHERE T3.C > 0

For the original query, the optimizer evaluates only plans compatible with the single table-access order 'T1,T2,T3'. For the rewritten query, it additionally considers the access order 'T3,T1,T2'.

A conversion of one outer join operation may trigger a conversion of another. Thus, the query:

 SELECT * FROM T1 LEFT JOIN T2 ON T2.A=T1.A
                  LEFT JOIN T3 ON T3.B=T2.B
   WHERE T3.C > 0

Is first converted to the query:

 SELECT * FROM T1 LEFT JOIN T2 ON T2.A=T1.A
                  INNER JOIN T3 ON T3.B=T2.B
   WHERE T3.C > 0

Which is equivalent to the query:

 SELECT * FROM (T1 LEFT JOIN T2 ON T2.A=T1.A), T3
   WHERE T3.C > 0 AND T3.B=T2.B

The remaining outer join operation can also be replaced by an inner join because the condition 'T3.B=T2.B' is null-rejected. This results in a query with no outer joins at all:

 SELECT * FROM (T1 INNER JOIN T2 ON T2.A=T1.A), T3
   WHERE T3.C > 0 AND T3.B=T2.B

Sometimes the optimizer succeeds in replacing an embedded outer join operation, but cannot convert the embedding outer join. The following query:

 SELECT * FROM T1 LEFT JOIN
               (T2 LEFT JOIN T3 ON T3.B=T2.B)
               ON T2.A=T1.A
   WHERE T3.C > 0

Is converted to:

 SELECT * FROM T1 LEFT JOIN
               (T2 INNER JOIN T3 ON T3.B=T2.B)
               ON T2.A=T1.A
   WHERE T3.C > 0

That can be rewritten only to the form still containing the embedding outer join operation:

 SELECT * FROM T1 LEFT JOIN
               (T2,T3)
               ON (T2.A=T1.A AND T3.B=T2.B)
   WHERE T3.C > 0

Any attempt to convert an embedded outer join operation in a query must take into account the join condition for the embedding outer join together with the 'WHERE' condition. In this query, the 'WHERE' condition is not null-rejected for the embedded outer join, but the join condition of the embedding outer join 'T2.A=T1.A AND T3.C=T1.C' is null-rejected:

 SELECT * FROM T1 LEFT JOIN
               (T2 LEFT JOIN T3 ON T3.B=T2.B)
               ON T2.A=T1.A AND T3.C=T1.C
   WHERE T3.D > 0 OR T1.D > 0

Consequently, the query can be converted to:

 SELECT * FROM T1 LEFT JOIN
               (T2, T3)
               ON T2.A=T1.A AND T3.C=T1.C AND T3.B=T2.B
   WHERE T3.D > 0 OR T1.D > 0

 File: manual.info.tmp, Node: mrr-optimization, Next: bnl-bka-optimization, Prev: outer-join-simplification, Up: select-optimization

8.2.1.10 Multi-Range Read Optimization ......................................

Reading rows using a range scan on a secondary index can result in many random disk accesses to the base table when the table is large and not stored in the storage engine's cache. With the Disk-Sweep Multi-Range Read (MRR) optimization, MySQL tries to reduce the number of random disk access for range scans by first scanning the index only and collecting the keys for the relevant rows. Then the keys are sorted and finally the rows are retrieved from the base table using the order of the primary key. The motivation for Disk-sweep MRR is to reduce the number of random disk accesses and instead achieve a more sequential scan of the base table data.

The Multi-Range Read optimization provides these benefits:

The MRR optimization is not supported with secondary indexes created on virtual generated columns. 'InnoDB' supports secondary indexes on virtual generated columns.

The following scenarios illustrate when MRR optimization can be advantageous:

Scenario A: MRR can be used for 'InnoDB' and 'MyISAM' tables for index range scans and equi-join operations.

  1. A portion of the index tuples are accumulated in a buffer.

  2. The tuples in the buffer are sorted by their data row ID.

  3. Data rows are accessed according to the sorted index tuple sequence.

Scenario B: MRR can be used for *note 'NDB': mysql-cluster. tables for multiple-range index scans or when performing an equi-join by an attribute.

  1. A portion of ranges, possibly single-key ranges, is accumulated in a buffer on the central node where the query is submitted.

  2. The ranges are sent to the execution nodes that access data rows.

  3. The accessed rows are packed into packages and sent back to the central node.

  4. The received packages with data rows are placed in a buffer.

  5. Data rows are read from the buffer.

When MRR is used, the 'Extra' column in *note 'EXPLAIN': explain. output shows 'Using MRR'.

'InnoDB' and 'MyISAM' do not use MRR if full table rows need not be accessed to produce the query result. This is the case if results can be produced entirely on the basis on information in the index tuples (through a covering index); MRR provides no benefit.

Two 'optimizer_switch' system variable flags provide an interface to the use of MRR optimization. The 'mrr' flag controls whether MRR is enabled. If 'mrr' is enabled ('on'), the 'mrr_cost_based' flag controls whether the optimizer attempts to make a cost-based choice between using and not using MRR ('on') or uses MRR whenever possible ('off'). By default, 'mrr' is 'on' and 'mrr_cost_based' is 'on'. See *note switchable-optimizations::.

For MRR, a storage engine uses the value of the 'read_rnd_buffer_size' system variable as a guideline for how much memory it can allocate for its buffer. The engine uses up to 'read_rnd_buffer_size' bytes and determines the number of ranges to process in a single pass.

 File: manual.info.tmp, Node: bnl-bka-optimization, Next: condition-filtering, Prev: mrr-optimization, Up: select-optimization

8.2.1.11 Block Nested-Loop and Batched Key Access Joins .......................................................

In MySQL, a Batched Key Access (BKA) Join algorithm is available that uses both index access to the joined table and a join buffer. The BKA algorithm supports inner join, outer join, and semijoin operations, including nested outer joins. Benefits of BKA include improved join performance due to more efficient table scanning. Also, the Block Nested-Loop (BNL) Join algorithm previously used only for inner joins is extended and can be employed for outer join and semijoin operations, including nested outer joins.

The following sections discuss the join buffer management that underlies the extension of the original BNL algorithm, the extended BNL algorithm, and the BKA algorithm. For information about semijoin strategies, see *note semijoins::

Join Buffer Management for Block Nested-Loop and Batched Key Access Algorithms

MySQL can employ join buffers to execute not only inner joins without index access to the inner table, but also outer joins and semijoins that appear after subquery flattening. Moreover, a join buffer can be effectively used when there is an index access to the inner table.

The join buffer management code slightly more efficiently utilizes join buffer space when storing the values of the interesting row columns: No additional bytes are allocated in buffers for a row column if its value is 'NULL', and the minimum number of bytes is allocated for any value of the *note 'VARCHAR': char. type.

The code supports two types of buffers, regular and incremental. Suppose that join buffer 'B1' is employed to join tables 't1' and 't2' and the result of this operation is joined with table 't3' using join buffer 'B2':

Incremental join buffers are always incremental relative to a join buffer from an earlier join operation, so the buffer from the first join operation is always a regular buffer. In the example just given, the buffer 'B1' used to join tables 't1' and 't2' must be a regular buffer.

Each row of the incremental buffer used for a join operation contains only the interesting columns of a row from the table to be joined. These columns are augmented with a reference to the interesting columns of the matched row from the table produced by the first join operand. Several rows in the incremental buffer can refer to the same row R whose columns are stored in the previous join buffers insofar as all these rows match row R.

Incremental buffers enable less frequent copying of columns from buffers used for previous join operations. This provides a savings in buffer space because in the general case a row produced by the first join operand can be matched by several rows produced by the second join operand. It is unnecessary to make several copies of a row from the first operand. Incremental buffers also provide a savings in processing time due to the reduction in copying time.

The 'block_nested_loop' and 'batched_key_access' flags of the 'optimizer_switch' system variable control how the optimizer uses the Block Nested-Loop and Batched Key Access join algorithms. By default, 'block_nested_loop' is 'on' and 'batched_key_access' is 'off'. See note switchable-optimizations::. Optimizer hints may also be applied; see note bnl-bka-optimizer-hints::.

For information about semijoin strategies, see *note semijoins::

Block Nested-Loop Algorithm for Outer Joins and Semijoins

The original implementation of the MySQL BNL algorithm is extended to support outer join and semijoin operations.

When these operations are executed with a join buffer, each row put into the buffer is supplied with a match flag.

If an outer join operation is executed using a join buffer, each row of the table produced by the second operand is checked for a match against each row in the join buffer. When a match is found, a new extended row is formed (the original row plus columns from the second operand) and sent for further extensions by the remaining join operations. In addition, the match flag of the matched row in the buffer is enabled. After all rows of the table to be joined have been examined, the join buffer is scanned. Each row from the buffer that does not have its match flag enabled is extended by 'NULL' complements ('NULL' values for each column in the second operand) and sent for further extensions by the remaining join operations.

The 'block_nested_loop' flag of the 'optimizer_switch' system variable controls how the optimizer uses the Block Nested-Loop algorithm. By default, 'block_nested_loop' is 'on'. See note switchable-optimizations::. Optimizer hints may also be applied; see note bnl-bka-optimizer-hints::.

In *note 'EXPLAIN': explain. output, use of BNL for a table is signified when the 'Extra' value contains 'Using join buffer (Block Nested Loop)' and the 'type' value is 'ALL', 'index', or 'range'.

Some cases involving the combination of one or more subqueries with one or more left joins, particularly those returning many rows, may use BNL even though it is not ideal in such instances. This is a known issue which is fixed in MySQL 8.0. If upgrading MySQL is not immediately feasible for you, you may wish to disable BNL in the meantime by setting 'optimizer_switch='block_nested_loop=off'' or employing the 'NO_BNL' optimizer hint to let the optimizer choose a better plan, using one or more index hints (see *note index-hints::), or some combination of these, to improve the performance of such queries.

For information about semijoin strategies, see *note semijoins::

Batched Key Access Joins

MySQL implements a method of joining tables called the Batched Key Access (BKA) join algorithm. BKA can be applied when there is an index access to the table produced by the second join operand. Like the BNL join algorithm, the BKA join algorithm employs a join buffer to accumulate the interesting columns of the rows produced by the first operand of the join operation. Then the BKA algorithm builds keys to access the table to be joined for all rows in the buffer and submits these keys in a batch to the database engine for index lookups. The keys are submitted to the engine through the Multi-Range Read (MRR) interface (see *note mrr-optimization::). After submission of the keys, the MRR engine functions perform lookups in the index in an optimal way, fetching the rows of the joined table found by these keys, and starts feeding the BKA join algorithm with matching rows. Each matching row is coupled with a reference to a row in the join buffer.

When BKA is used, the value of 'join_buffer_size' defines how large the batch of keys is in each request to the storage engine. The larger the buffer, the more sequential access is made to the right hand table of a join operation, which can significantly improve performance.

For BKA to be used, the 'batched_key_access' flag of the 'optimizer_switch' system variable must be set to 'on'. BKA uses MRR, so the 'mrr' flag must also be 'on'. Currently, the cost estimation for MRR is too pessimistic. Hence, it is also necessary for 'mrr_cost_based' to be 'off' for BKA to be used. The following setting enables BKA:

 mysql> SET optimizer_switch='mrr=on,mrr_cost_based=off,batched_key_access=on';

There are two scenarios by which MRR functions execute:

With the first scenario, a portion of the join buffer is reserved to store row IDs (primary keys) selected by index lookups and passed as a parameter to the MRR functions.

There is no special buffer to store keys built for rows from the join buffer. Instead, a function that builds the key for the next row in the buffer is passed as a parameter to the MRR functions.

In *note 'EXPLAIN': explain. output, use of BKA for a table is signified when the 'Extra' value contains 'Using join buffer (Batched Key Access)' and the 'type' value is 'ref' or 'eq_ref'.

Optimizer Hints for Block Nested-Loop and Batched Key Access Algorithms

In addition to using the 'optimizer_switch' system variable to control optimizer use of the BNL and BKA algorithms session-wide, MySQL supports optimizer hints to influence the optimizer on a per-statement basis. See *note optimizer-hints::.

To use a BNL or BKA hint to enable join buffering for any inner table of an outer join, join buffering must be enabled for all inner tables of the outer join.

 File: manual.info.tmp, Node: condition-filtering, Next: is-null-optimization, Prev: bnl-bka-optimization, Up: select-optimization

8.2.1.12 Condition Filtering ............................

In join processing, prefix rows are those rows passed from one table in a join to the next. In general, the optimizer attempts to put tables with low prefix counts early in the join order to keep the number of row combinations from increasing rapidly. To the extent that the optimizer can use information about conditions on rows selected from one table and passed to the next, the more accurately it can compute row estimates and choose the best execution plan.

Without condition filtering, the prefix row count for a table is based on the estimated number of rows selected by the 'WHERE' clause according to whichever access method the optimizer chooses. Condition filtering enables the optimizer to use other relevant conditions in the 'WHERE' clause not taken into account by the access method, and thus improve its prefix row count estimates. For example, even though there might be an index-based access method that can be used to select rows from the current table in a join, there might also be additional conditions for the table in the 'WHERE' clause that can filter (further restrict) the estimate for qualifying rows passed to the next table.

A condition contributes to the filtering estimate only if:

In *note 'EXPLAIN': explain. output, the 'rows' column indicates the row estimate for the chosen access method, and the 'filtered' column reflects the effect of condition filtering. 'filtered' values are expressed as percentages. The maximum value is 100, which means no filtering of rows occurred. Values decreasing from 100 indicate increasing amounts of filtering.

The prefix row count (the number of rows estimated to be passed from the current table in a join to the next) is the product of the 'rows' and 'filtered' values. That is, the prefix row count is the estimated row count, reduced by the estimated filtering effect. For example, if 'rows' is 1000 and 'filtered' is 20%, condition filtering reduces the estimated row count of 1000 to a prefix row count of 1000 x 20% = 1000 x .2 = 200.

Consider the following query:

 SELECT *
   FROM employee JOIN department ON employee.dept_no = department.dept_no
   WHERE employee.first_name = 'John'
   AND employee.hire_date BETWEEN '2018-01-01' AND '2018-06-01';

Suppose that the data set has these characteristics:

Without condition filtering, *note 'EXPLAIN': explain. produces output like this:

 +----+------------+--------+------------------+---------+---------+------+----------+
 | id | table      | type   | possible_keys    | key     | ref     | rows | filtered |
 +----+------------+--------+------------------+---------+---------+------+----------+
 | 1  | employee   | ref    | name,h_date,dept | name    | const   | 8    | 100.00   |
 | 1  | department | eq_ref | PRIMARY          | PRIMARY | dept_no | 1    | 100.00   |
 +----+------------+--------+------------------+---------+---------+------+----------+

For 'employee', the access method on the 'name' index picks up the 8 rows that match a name of ''John''. No filtering is done ('filtered' is 100%), so all rows are prefix rows for the next table: The prefix row count is 'rows' x 'filtered' = 8 x 100% = 8.

With condition filtering, the optimizer additionally takes into account conditions from the 'WHERE' clause not taken into account by the access method. In this case, the optimizer uses heuristics to estimate a filtering effect of 16.31% for the 'BETWEEN' condition on 'employee.hire_date'. As a result, *note 'EXPLAIN': explain. produces output like this:

 +----+------------+--------+------------------+---------+---------+------+----------+
 | id | table      | type   | possible_keys    | key     | ref     | rows | filtered |
 +----+------------+--------+------------------+---------+---------+------+----------+
 | 1  | employee   | ref    | name,h_date,dept | name    | const   | 8    | 16.31    |
 | 1  | department | eq_ref | PRIMARY          | PRIMARY | dept_no | 1    | 100.00   |
 +----+------------+--------+------------------+---------+---------+------+----------+

Now the prefix row count is 'rows' x 'filtered' = 8 x 16.31% = 1.3, which more closely reflects actual data set.

Normally, the optimizer does not calculate the condition filtering effect (prefix row count reduction) for the last joined table because there is no next table to pass rows to. An exception occurs for *note 'EXPLAIN': explain.: To provide more information, the filtering effect is calculated for all joined tables, including the last one.

To control whether the optimizer considers additional filtering conditions, use the 'condition_fanout_filter' flag of the 'optimizer_switch' system variable (see *note switchable-optimizations::). This flag is enabled by default but can be disabled to suppress condition filtering (for example, if a particular query is found to yield better performance without it).

If the optimizer overestimates the effect of condition filtering, performance may be worse than if condition filtering is not used. In such cases, these techniques may help:

 File: manual.info.tmp, Node: is-null-optimization, Next: order-by-optimization, Prev: condition-filtering, Up: select-optimization

8.2.1.13 IS NULL Optimization .............................

MySQL can perform the same optimization on COL_NAME 'IS NULL' that it can use for COL_NAME '=' CONSTANT_VALUE. For example, MySQL can use indexes and ranges to search for 'NULL' with 'IS NULL'.

Examples:

 SELECT * FROM TBL_NAME WHERE KEY_COL IS NULL;

 SELECT * FROM TBL_NAME WHERE KEY_COL <=> NULL;

 SELECT * FROM TBL_NAME
   WHERE KEY_COL=CONST1 OR KEY_COL=CONST2 OR KEY_COL IS NULL;

If a 'WHERE' clause includes a COL_NAME 'IS NULL' condition for a column that is declared as 'NOT NULL', that expression is optimized away. This optimization does not occur in cases when the column might produce 'NULL' anyway (for example, if it comes from a table on the right side of a 'LEFT JOIN').

MySQL can also optimize the combination 'COL_NAME = EXPR OR COL_NAME IS NULL', a form that is common in resolved subqueries. *note 'EXPLAIN': explain. shows 'ref_or_null' when this optimization is used.

This optimization can handle one 'IS NULL' for any key part.

Some examples of queries that are optimized, assuming that there is an index on columns 'a' and 'b' of table 't2':

 SELECT * FROM t1 WHERE t1.a=EXPR OR t1.a IS NULL;

 SELECT * FROM t1, t2 WHERE t1.a=t2.a OR t2.a IS NULL;

 SELECT * FROM t1, t2
   WHERE (t1.a=t2.a OR t2.a IS NULL) AND t2.b=t1.b;

 SELECT * FROM t1, t2
   WHERE t1.a=t2.a AND (t2.b=t1.b OR t2.b IS NULL);

 SELECT * FROM t1, t2
   WHERE (t1.a=t2.a AND t2.a IS NULL AND ...)
   OR (t1.a=t2.a AND t2.a IS NULL AND ...);

'ref_or_null' works by first doing a read on the reference key, and then a separate search for rows with a 'NULL' key value.

The optimization can handle only one 'IS NULL' level. In the following query, MySQL uses key lookups only on the expression '(t1.a=t2.a AND t2.a IS NULL)' and is not able to use the key part on 'b':

 SELECT * FROM t1, t2
   WHERE (t1.a=t2.a AND t2.a IS NULL)
   OR (t1.b=t2.b AND t2.b IS NULL);

 File: manual.info.tmp, Node: order-by-optimization, Next: group-by-optimization, Prev: is-null-optimization, Up: select-optimization

8.2.1.14 ORDER BY Optimization ..............................

This section describes when MySQL can use an index to satisfy an 'ORDER BY' clause, the 'filesort' operation used when an index cannot be used, and execution plan information available from the optimizer about 'ORDER BY'.

An 'ORDER BY' with and without 'LIMIT' may return rows in different orders, as discussed in *note limit-optimization::.

Use of Indexes to Satisfy ORDER BY

In some cases, MySQL may use an index to satisfy an 'ORDER BY' clause and avoid the extra sorting involved in performing a 'filesort' operation.

The index may also be used even if the 'ORDER BY' does not match the index exactly, as long as all unused portions of the index and all extra 'ORDER BY' columns are constants in the 'WHERE' clause. If the index does not contain all columns accessed by the query, the index is used only if index access is cheaper than other access methods.

Assuming that there is an index on '(KEY_PART1, KEY_PART2)', the following queries may use the index to resolve the 'ORDER BY' part. Whether the optimizer actually does so depends on whether reading the index is more efficient than a table scan if columns not in the index must also be read.

In some cases, MySQL cannot use indexes to resolve the 'ORDER BY', although it may still use indexes to find the rows that match the 'WHERE' clause. Examples:

Availability of an index for sorting may be affected by the use of column aliases. Suppose that the column 't1.a' is indexed. In this statement, the name of the column in the select list is 'a'. It refers to 't1.a', as does the reference to 'a' in the 'ORDER BY', so the index on 't1.a' can be used:

 SELECT a FROM t1 ORDER BY a;

In this statement, the name of the column in the select list is also 'a', but it is the alias name. It refers to 'ABS(a)', as does the reference to 'a' in the 'ORDER BY', so the index on 't1.a' cannot be used:

 SELECT ABS(a) AS a FROM t1 ORDER BY a;

In the following statement, the 'ORDER BY' refers to a name that is not the name of a column in the select list. But there is a column in 't1' named 'a', so the 'ORDER BY' refers to 't1.a' and the index on 't1.a' can be used. (The resulting sort order may be completely different from the order for 'ABS(a)', of course.)

 SELECT ABS(a) AS b FROM t1 ORDER BY a;

By default, MySQL sorts 'GROUP BY COL1, COL2, ...' queries as if you also included 'ORDER BY COL1, COL2, ...' in the query. If you include an explicit 'ORDER BY' clause that contains the same column list, MySQL optimizes it away without any speed penalty, although the sorting still occurs.

If a query includes 'GROUP BY' but you want to avoid the overhead of sorting the result, you can suppress sorting by specifying 'ORDER BY NULL'. For example:

 INSERT INTO foo
 SELECT a, COUNT(*) FROM bar GROUP BY a ORDER BY NULL;

The optimizer may still choose to use sorting to implement grouping operations. 'ORDER BY NULL' suppresses sorting of the result, not prior sorting done by grouping operations to determine the result.

Note:

'GROUP BY' implicitly sorts by default (that is, in the absence of 'ASC' or 'DESC' designators for 'GROUP BY' columns). However, relying on implicit 'GROUP BY' sorting (that is, sorting in the absence of 'ASC' or 'DESC' designators) or explicit sorting for 'GROUP BY' (that is, by using explicit 'ASC' or 'DESC' designators for 'GROUP BY' columns) is deprecated. To produce a given sort order, provide an 'ORDER BY' clause.

Use of filesort to Satisfy ORDER BY

If an index cannot be used to satisfy an 'ORDER BY' clause, MySQL performs a 'filesort' operation that reads table rows and sorts them. A 'filesort' constitutes an extra sorting phase in query execution.

To obtain memory for 'filesort' operations, the optimizer allocates a fixed amount of 'sort_buffer_size' bytes up front. Individual sessions can change the session value of this variable as desired to avoid excessive memory use, or to allocate more memory as necessary.

A 'filesort' operation uses temporary disk files as necessary if the result set is too large to fit in memory. Some types of queries are particularly suited to completely in-memory 'filesort' operations. For example, the optimizer can use 'filesort' to efficiently handle in memory, without temporary files, the 'ORDER BY' operation for queries (and subqueries) of the following form:

 SELECT ... FROM SINGLE_TABLE ... ORDER BY NON_INDEX_COLUMN [DESC] LIMIT [M,]N;

Such queries are common in web applications that display only a few rows from a larger result set. Examples:

 SELECT col1, ... FROM t1 ... ORDER BY name LIMIT 10;
 SELECT col1, ... FROM t1 ... ORDER BY RAND() LIMIT 15;

Influencing ORDER BY Optimization

For slow 'ORDER BY' queries for which 'filesort' is not used, try lowering the 'max_length_for_sort_data' system variable to a value that is appropriate to trigger a 'filesort'. (A symptom of setting the value of this variable too high is a combination of high disk activity and low CPU activity.)

To increase 'ORDER BY' speed, check whether you can get MySQL to use indexes rather than an extra sorting phase. If this is not possible, try the following strategies:

ORDER BY Execution Plan Information Available

With note 'EXPLAIN': explain. (see note using-explain::), you can check whether MySQL can use indexes to resolve an 'ORDER BY' clause:

In addition, if a 'filesort' is performed, optimizer trace output includes a 'filesort_summary' block. For example:

 "filesort_summary": {
   "rows": 100,
   "examined_rows": 100,
   "number_of_tmp_files": 0,
   "sort_buffer_size": 25192,
   "sort_mode": "<sort_key, packed_additional_fields>"
 }

The 'sort_mode' value provides information about the contents of tuples in the sort buffer:

note 'EXPLAIN': explain. does not distinguish whether the optimizer does or does not perform a 'filesort' in memory. Use of an in-memory 'filesort' can be seen in optimizer trace output. Look for 'filesort_priority_queue_optimization'. For information about the optimizer trace, see note optimizer-tracing::.

 File: manual.info.tmp, Node: group-by-optimization, Next: distinct-optimization, Prev: order-by-optimization, Up: select-optimization

8.2.1.15 GROUP BY Optimization ..............................

The most general way to satisfy a 'GROUP BY' clause is to scan the whole table and create a new temporary table where all rows from each group are consecutive, and then use this temporary table to discover groups and apply aggregate functions (if any). In some cases, MySQL is able to do much better than that and avoid creation of temporary tables by using index access.

The most important preconditions for using indexes for 'GROUP BY' are that all 'GROUP BY' columns reference attributes from the same index, and that the index stores its keys in order (as is true, for example, for a 'BTREE' index, but not for a 'HASH' index). Whether use of temporary tables can be replaced by index access also depends on which parts of an index are used in a query, the conditions specified for these parts, and the selected aggregate functions.

There are two ways to execute a 'GROUP BY' query through index access, as detailed in the following sections. The first method applies the grouping operation together with all range predicates (if any). The second method first performs a range scan, and then groups the resulting tuples.

In MySQL, 'GROUP BY' is used for sorting, so the server may also apply 'ORDER BY' optimizations to grouping. However, relying on implicit or explicit 'GROUP BY' sorting is deprecated. See *note order-by-optimization::.

Loose Index Scan

The most efficient way to process 'GROUP BY' is when an index is used to directly retrieve the grouping columns. With this access method, MySQL uses the property of some index types that the keys are ordered (for example, 'BTREE'). This property enables use of lookup groups in an index without having to consider all keys in the index that satisfy all 'WHERE' conditions. This access method considers only a fraction of the keys in an index, so it is called a Loose Index Scan. When there is no 'WHERE' clause, a Loose Index Scan reads as many keys as the number of groups, which may be a much smaller number than that of all keys. If the 'WHERE' clause contains range predicates (see the discussion of the 'range' join type in *note using-explain::), a Loose Index Scan looks up the first key of each group that satisfies the range conditions, and again reads the smallest possible number of keys. This is possible under the following conditions:

If Loose Index Scan is applicable to a query, the *note 'EXPLAIN': explain. output shows 'Using index for group-by' in the 'Extra' column.

Assume that there is an index 'idx(c1,c2,c3)' on table 't1(c1,c2,c3,c4)'. The Loose Index Scan access method can be used for the following queries:

 SELECT c1, c2 FROM t1 GROUP BY c1, c2;
 SELECT DISTINCT c1, c2 FROM t1;
 SELECT c1, MIN(c2) FROM t1 GROUP BY c1;
 SELECT c1, c2 FROM t1 WHERE c1 < CONST GROUP BY c1, c2;
 SELECT MAX(c3), MIN(c3), c1, c2 FROM t1 WHERE c2 > CONST GROUP BY c1, c2;
 SELECT c2 FROM t1 WHERE c1 < CONST GROUP BY c1, c2;
 SELECT c1, c2 FROM t1 WHERE c3 = CONST GROUP BY c1, c2;

The following queries cannot be executed with this quick select method, for the reasons given:

The Loose Index Scan access method can be applied to other forms of aggregate function references in the select list, in addition to the 'MIN()' and 'MAX()' references already supported:

Assume that there is an index 'idx(c1,c2,c3)' on table 't1(c1,c2,c3,c4)'. The Loose Index Scan access method can be used for the following queries:

 SELECT COUNT(DISTINCT c1), SUM(DISTINCT c1) FROM t1;

 SELECT COUNT(DISTINCT c1, c2), COUNT(DISTINCT c2, c1) FROM t1;

Tight Index Scan

A Tight Index Scan may be either a full index scan or a range index scan, depending on the query conditions.

When the conditions for a Loose Index Scan are not met, it still may be possible to avoid creation of temporary tables for 'GROUP BY' queries. If there are range conditions in the 'WHERE' clause, this method reads only the keys that satisfy these conditions. Otherwise, it performs an index scan. Because this method reads all keys in each range defined by the 'WHERE' clause, or scans the whole index if there are no range conditions, it is called a Tight Index Scan. With a Tight Index Scan, the grouping operation is performed only after all keys that satisfy the range conditions have been found.

For this method to work, it is sufficient that there be a constant equality condition for all columns in a query referring to parts of the key coming before or in between parts of the 'GROUP BY' key. The constants from the equality conditions fill in any 'gaps' in the search keys so that it is possible to form complete prefixes of the index. These index prefixes then can be used for index lookups. If the 'GROUP BY' result requires sorting, and it is possible to form search keys that are prefixes of the index, MySQL also avoids extra sorting operations because searching with prefixes in an ordered index already retrieves all the keys in order.

Assume that there is an index 'idx(c1,c2,c3)' on table 't1(c1,c2,c3,c4)'. The following queries do not work with the Loose Index Scan access method described previously, but still work with the Tight Index Scan access method.

 File: manual.info.tmp, Node: distinct-optimization, Next: limit-optimization, Prev: group-by-optimization, Up: select-optimization

8.2.1.16 DISTINCT Optimization ..............................

'DISTINCT' combined with 'ORDER BY' needs a temporary table in many cases.

Because 'DISTINCT' may use 'GROUP BY', learn how MySQL works with columns in 'ORDER BY' or 'HAVING' clauses that are not part of the selected columns. See *note group-by-handling::.

In most cases, a 'DISTINCT' clause can be considered as a special case of 'GROUP BY'. For example, the following two queries are equivalent:

 SELECT DISTINCT c1, c2, c3 FROM t1
 WHERE c1 > CONST;

 SELECT c1, c2, c3 FROM t1
 WHERE c1 > CONST GROUP BY c1, c2, c3;

Due to this equivalence, the optimizations applicable to 'GROUP BY' queries can be also applied to queries with a 'DISTINCT' clause. Thus, for more details on the optimization possibilities for 'DISTINCT' queries, see *note group-by-optimization::.

When combining 'LIMIT ROW_COUNT' with 'DISTINCT', MySQL stops as soon as it finds ROW_COUNT unique rows.

If you do not use columns from all tables named in a query, MySQL stops scanning any unused tables as soon as it finds the first match. In the following case, assuming that 't1' is used before 't2' (which you can check with *note 'EXPLAIN': explain.), MySQL stops reading from 't2' (for any particular row in 't1') when it finds the first row in 't2':

 SELECT DISTINCT t1.a FROM t1, t2 where t1.a=t2.a;

 File: manual.info.tmp, Node: limit-optimization, Next: function-optimization, Prev: distinct-optimization, Up: select-optimization

8.2.1.17 LIMIT Query Optimization .................................

If you need only a specified number of rows from a result set, use a 'LIMIT' clause in the query, rather than fetching the whole result set and throwing away the extra data.

MySQL sometimes optimizes a query that has a 'LIMIT ROW_COUNT' clause and no 'HAVING' clause:

If multiple rows have identical values in the 'ORDER BY' columns, the server is free to return those rows in any order, and may do so differently depending on the overall execution plan. In other words, the sort order of those rows is nondeterministic with respect to the nonordered columns.

One factor that affects the execution plan is 'LIMIT', so an 'ORDER BY' query with and without 'LIMIT' may return rows in different orders. Consider this query, which is sorted by the 'category' column but nondeterministic with respect to the 'id' and 'rating' columns:

 mysql> SELECT * FROM ratings ORDER BY category;
 +----+----------+--------+
 | id | category | rating |
 +----+----------+--------+
 |  1 |        1 |    4.5 |
 |  5 |        1 |    3.2 |
 |  3 |        2 |    3.7 |
 |  4 |        2 |    3.5 |
 |  6 |        2 |    3.5 |
 |  2 |        3 |    5.0 |
 |  7 |        3 |    2.7 |
 +----+----------+--------+

Including 'LIMIT' may affect order of rows within each 'category' value. For example, this is a valid query result:

 mysql> SELECT * FROM ratings ORDER BY category LIMIT 5;
 +----+----------+--------+
 | id | category | rating |
 +----+----------+--------+
 |  1 |        1 |    4.5 |
 |  5 |        1 |    3.2 |
 |  4 |        2 |    3.5 |
 |  3 |        2 |    3.7 |
 |  6 |        2 |    3.5 |
 +----+----------+--------+

In each case, the rows are sorted by the 'ORDER BY' column, which is all that is required by the SQL standard.

If it is important to ensure the same row order with and without 'LIMIT', include additional columns in the 'ORDER BY' clause to make the order deterministic. For example, if 'id' values are unique, you can make rows for a given 'category' value appear in 'id' order by sorting like this:

 mysql> SELECT * FROM ratings ORDER BY category, id;
 +----+----------+--------+
 | id | category | rating |
 +----+----------+--------+
 |  1 |        1 |    4.5 |
 |  5 |        1 |    3.2 |
 |  3 |        2 |    3.7 |
 |  4 |        2 |    3.5 |
 |  6 |        2 |    3.5 |
 |  2 |        3 |    5.0 |
 |  7 |        3 |    2.7 |
 +----+----------+--------+

 mysql> SELECT * FROM ratings ORDER BY category, id LIMIT 5;
 +----+----------+--------+
 | id | category | rating |
 +----+----------+--------+
 |  1 |        1 |    4.5 |
 |  5 |        1 |    3.2 |
 |  3 |        2 |    3.7 |
 |  4 |        2 |    3.5 |
 |  6 |        2 |    3.5 |
 +----+----------+--------+

For a query with an 'ORDER BY' or 'GROUP BY' and a 'LIMIT' clause, the optimizer tries to choose an ordered index by default when it appears doing so would speed up query execution. Prior to MySQL 5.7.33, there was no way to override this behavior, even in cases where using some other optimization might be faster. Beginning with MySQL 5.7.33, it is possible to turn off this optimization by setting the 'optimizer_switch' system variable's 'prefer_ordering_index' flag to 'off'.

Example: First we create and populate a table 't' as shown here:

 # Create and populate a table t:

 mysql> CREATE TABLE t (
     ->     id1 BIGINT NOT NULL,
     ->     id2 BIGINT NOT NULL,
     ->     c1 VARCHAR(50) NOT NULL,
     ->     c2 VARCHAR(50) NOT NULL,
     ->  PRIMARY KEY (id1),
     ->  INDEX i (id2, c1)
     -> );

 # [Insert some rows into table t - not shown]

Verify that the 'prefer_ordering_index' flag is enabled:

 mysql> SELECT @@optimizer_switch LIKE '%prefer_ordering_index=on%';
 +------------------------------------------------------+
 | @@optimizer_switch LIKE '%prefer_ordering_index=on%' |
 +------------------------------------------------------+
 |                                                    1 |
 +------------------------------------------------------+

Since the following query has a 'LIMIT' clause, we expect it to use an ordered index, if possible. In this case, as we can see from the *note 'EXPLAIN': explain. output, it uses the table's primary key.

 mysql> EXPLAIN SELECT c2 FROM t
     ->     WHERE id2 > 3
     ->     ORDER BY id1 ASC LIMIT 2\G
 *************************** 1. row ***************************
            id: 1
   select_type: SIMPLE
         table: t
    partitions: NULL
          type: index
 possible_keys: i
           key: PRIMARY
       key_len: 8
           ref: NULL
          rows: 2
      filtered: 70.00
         Extra: Using where

Now we disable the 'prefer_ordering_index' flag, and re-run the same query; this time it uses the index 'i' (which includes the 'id2' column used in the 'WHERE' clause), and a filesort:

 mysql> SET optimizer_switch = "prefer_ordering_index=off";

 mysql> EXPLAIN SELECT c2 FROM t
     ->     WHERE id2 > 3
     ->     ORDER BY id1 ASC LIMIT 2\G
 *************************** 1. row ***************************
            id: 1
   select_type: SIMPLE
         table: t
    partitions: NULL
          type: range
 possible_keys: i
           key: i
       key_len: 8
           ref: NULL
          rows: 14
      filtered: 100.00
         Extra: Using index condition; Using filesort

See also *note switchable-optimizations::.

 File: manual.info.tmp, Node: function-optimization, Next: row-constructor-optimization, Prev: limit-optimization, Up: select-optimization

8.2.1.18 Function Call Optimization ...................................

MySQL functions are tagged internally as deterministic or nondeterministic. A function is nondeterministic if, given fixed values for its arguments, it can return different results for different invocations. Examples of nondeterministic functions: 'RAND()', 'UUID()'.

If a function is tagged nondeterministic, a reference to it in a 'WHERE' clause is evaluated for every row (when selecting from one table) or combination of rows (when selecting from a multiple-table join).

MySQL also determines when to evaluate functions based on types of arguments, whether the arguments are table columns or constant values. A deterministic function that takes a table column as argument must be evaluated whenever that column changes value.

Nondeterministic functions may affect query performance. For example, some optimizations may not be available, or more locking might be required. The following discussion uses 'RAND()' but applies to other nondeterministic functions as well.

Suppose that a table 't' has this definition:

 CREATE TABLE t (id INT NOT NULL PRIMARY KEY, col_a VARCHAR(100));

Consider these two queries:

 SELECT * FROM t WHERE id = POW(1,2);
 SELECT * FROM t WHERE id = FLOOR(1 + RAND() * 49);

Both queries appear to use a primary key lookup because of the equality comparison against the primary key, but that is true only for the first of them:

The effects of nondeterminism are not limited to note 'SELECT': select. statements. This note 'UPDATE': update. statement uses a nondeterministic function to select rows to be modified:

 UPDATE t SET col_a = SOME_EXPR WHERE id = FLOOR(1 + RAND() * 49);

Presumably the intent is to update at most a single row for which the primary key matches the expression. However, it might update zero, one, or multiple rows, depending on the 'id' column values and the values in the 'RAND()' sequence.

The behavior just described has implications for performance and replication:

The difficulties stem from the fact that the 'RAND()' function is evaluated once for every row of the table. To avoid multiple function evaluations, use one of these techniques:

As mentioned previously, a nondeterministic expression in the 'WHERE' clause might prevent optimizations and result in a table scan. However, it may be possible to partially optimize the 'WHERE' clause if other expressions are deterministic. For example:

 SELECT * FROM t WHERE partial_key=5 AND some_column=RAND();

If the optimizer can use 'partial_key' to reduce the set of rows selected, 'RAND()' is executed fewer times, which diminishes the effect of nondeterminism on optimization.

 File: manual.info.tmp, Node: row-constructor-optimization, Next: table-scan-avoidance, Prev: function-optimization, Up: select-optimization

8.2.1.19 Row Constructor Expression Optimization ................................................

Row constructors permit simultaneous comparisons of multiple values. For example, these two statements are semantically equivalent:

 SELECT * FROM t1 WHERE (column1,column2) = (1,1);
 SELECT * FROM t1 WHERE column1 = 1 AND column2 = 1;

In addition, the optimizer handles both expressions the same way.

The optimizer is less likely to use available indexes if the row constructor columns do not cover the prefix of an index. Consider the following table, which has a primary key on '(c1, c2, c3)':

 CREATE TABLE t1 (
   c1 INT, c2 INT, c3 INT, c4 CHAR(100),
   PRIMARY KEY(c1,c2,c3)
 );

In this query, the 'WHERE' clause uses all columns in the index. However, the row constructor itself does not cover an index prefix, with the result that the optimizer uses only 'c1' ('key_len=4', the size of 'c1'):

 mysql> EXPLAIN SELECT * FROM t1
        WHERE c1=1 AND (c2,c3) > (1,1)\G
 *************************** 1. row ***************************
            id: 1
   select_type: SIMPLE
         table: t1
    partitions: NULL
          type: ref
 possible_keys: PRIMARY
           key: PRIMARY
       key_len: 4
           ref: const
          rows: 3
      filtered: 100.00
         Extra: Using where

In such cases, rewriting the row constructor expression using an equivalent nonconstructor expression may result in more complete index use. For the given query, the row constructor and equivalent nonconstructor expressions are:

 (c2,c3) > (1,1)
 c2 > 1 OR ((c2 = 1) AND (c3 > 1))

Rewriting the query to use the nonconstructor expression results in the optimizer using all three columns in the index ('key_len=12'):

 mysql> EXPLAIN SELECT * FROM t1
        WHERE c1 = 1 AND (c2 > 1 OR ((c2 = 1) AND (c3 > 1)))\G
 *************************** 1. row ***************************
            id: 1
   select_type: SIMPLE
         table: t1
    partitions: NULL
          type: range
 possible_keys: PRIMARY
           key: PRIMARY
       key_len: 12
           ref: NULL
          rows: 3
      filtered: 100.00
         Extra: Using where

Thus, for better results, avoid mixing row constructors with 'AND'/'OR' expressions. Use one or the other.

Under certain conditions, the optimizer can apply the range access method to 'IN()' expressions that have row constructor arguments. See *note row-constructor-range-optimization::.

 File: manual.info.tmp, Node: table-scan-avoidance, Prev: row-constructor-optimization, Up: select-optimization

8.2.1.20 Avoiding Full Table Scans ..................................

The output from *note 'EXPLAIN': explain. shows 'ALL' in the 'type' column when MySQL uses a full table scan to resolve a query. This usually happens under the following conditions:

For small tables, a table scan often is appropriate and the performance impact is negligible. For large tables, try the following techniques to avoid having the optimizer incorrectly choose a table scan:

 File: manual.info.tmp, Node: subquery-optimization, Next: information-schema-optimization, Prev: select-optimization, Up: statement-optimization

8.2.2 Optimizing Subqueries, Derived Tables, and View References

The MySQL query optimizer has different strategies available to evaluate subqueries:

For derived tables, the optimizer has these choices (which also apply to view references):

The following discussion provides more information about the preceding optimization strategies.

Note:

A limitation on note 'UPDATE': update. and note 'DELETE': delete. statements that use a subquery to modify a single table is that the optimizer does not use semijoin or materialization subquery optimizations. As a workaround, try rewriting them as multiple-table note 'UPDATE': update. and note 'DELETE': delete. statements that use a join rather than a subquery.

 File: manual.info.tmp, Node: semijoins, Next: subquery-materialization, Prev: subquery-optimization, Up: subquery-optimization

8.2.2.1 Optimizing Subqueries, Derived Tables, and View References with Semijoin Transformations ................................................................................................

A semijoin is a preparation-time transformation that enables multiple execution strategies such as table pullout, duplicate weedout, first match, loose scan, and materialization. The optimizer uses semijoin strategies to improve subquery execution, as described in this section.

For an inner join between two tables, the join returns a row from one table as many times as there are matches in the other table. But for some questions, the only information that matters is whether there is a match, not the number of matches. Suppose that there are tables named 'class' and 'roster' that list classes in a course curriculum and class rosters (students enrolled in each class), respectively. To list the classes that actually have students enrolled, you could use this join:

 SELECT class.class_num, class.class_name
 FROM class INNER JOIN roster
 WHERE class.class_num = roster.class_num;

However, the result lists each class once for each enrolled student. For the question being asked, this is unnecessary duplication of information.

Assuming that 'class_num' is a primary key in the 'class' table, duplicate suppression is possible by using *note 'SELECT DISTINCT': select, but it is inefficient to generate all matching rows first only to eliminate duplicates later.

The same duplicate-free result can be obtained by using a subquery:

 SELECT class_num, class_name
 FROM class
 WHERE class_num IN (SELECT class_num FROM roster);

Here, the optimizer can recognize that the 'IN' clause requires the subquery to return only one instance of each class number from the 'roster' table. In this case, the query can use a semijoin; that is, an operation that returns only one instance of each row in 'class' that is matched by rows in 'roster'.

Outer join and inner join syntax is permitted in the outer query specification, and table references may be base tables, derived tables, or view references.

In MySQL, a subquery must satisfy these criteria to be handled as a semijoin:

The subquery may be correlated or uncorrelated. 'DISTINCT' is permitted, as is 'LIMIT' unless 'ORDER BY' is also used.

If a subquery meets the preceding criteria, MySQL converts it to a semijoin and makes a cost-based choice from these strategies:

Each of these strategies can be enabled or disabled using the following 'optimizer_switch' system variable flags:

These flags are enabled by default. See *note switchable-optimizations::.

The optimizer minimizes differences in handling of views and derived tables. This affects queries that use the 'STRAIGHT_JOIN' modifier and a view with an 'IN' subquery that can be converted to a semijoin. The following query illustrates this because the change in processing causes a change in transformation, and thus a different execution strategy:

 CREATE VIEW v AS
 SELECT *
 FROM t1
 WHERE a IN (SELECT b
            FROM t2);

 SELECT STRAIGHT_JOIN *
 FROM t3 JOIN v ON t3.x = v.a;

The optimizer first looks at the view and converts the 'IN' subquery to a semijoin, then checks whether it is possible to merge the view into the outer query. Because the 'STRAIGHT_JOIN' modifier in the outer query prevents semijoin, the optimizer refuses the merge, causing derived table evaluation using a materialized table.

*note 'EXPLAIN': explain. output indicates the use of semijoin strategies as follows:

 File: manual.info.tmp, Node: subquery-materialization, Next: subquery-optimization-with-exists, Prev: semijoins, Up: subquery-optimization

8.2.2.2 Optimizing Subqueries with Materialization ..................................................

The optimizer uses materialization to enable more efficient subquery processing. Materialization speeds up query execution by generating a subquery result as a temporary table, normally in memory. The first time MySQL needs the subquery result, it materializes that result into a temporary table. Any subsequent time the result is needed, MySQL refers again to the temporary table. The optimizer may index the table with a hash index to make lookups fast and inexpensive. The index contains unique values to eliminate duplicates and make the table smaller.

Subquery materialization uses an in-memory temporary table when possible, falling back to on-disk storage if the table becomes too large. See *note internal-temporary-tables::.

If materialization is not used, the optimizer sometimes rewrites a noncorrelated subquery as a correlated subquery. For example, the following 'IN' subquery is noncorrelated (WHERE_CONDITION involves only columns from 't2' and not 't1'):

 SELECT * FROM t1
 WHERE t1.a IN (SELECT t2.b FROM t2 WHERE WHERE_CONDITION);

The optimizer might rewrite this as an 'EXISTS' correlated subquery:

 SELECT * FROM t1
 WHERE EXISTS (SELECT t2.b FROM t2 WHERE WHERE_CONDITION AND t1.a=t2.b);

Subquery materialization using a temporary table avoids such rewrites and makes it possible to execute the subquery only once rather than once per row of the outer query.

For subquery materialization to be used in MySQL, the 'optimizer_switch' system variable 'materialization' flag must be enabled. (See *note switchable-optimizations::.) With the 'materialization' flag enabled, materialization applies to subquery predicates that appear anywhere (in the select list, 'WHERE', 'ON', 'GROUP BY', 'HAVING', or 'ORDER BY'), for predicates that fall into any of these use cases:

The following examples illustrate how the requirement for equivalence of 'UNKNOWN' and 'FALSE' predicate evaluation affects whether subquery materialization can be used. Assume that WHERE_CONDITION involves columns only from 't2' and not 't1' so that the subquery is noncorrelated.

This query is subject to materialization:

 SELECT * FROM t1
 WHERE t1.a IN (SELECT t2.b FROM t2 WHERE WHERE_CONDITION);

Here, it does not matter whether the 'IN' predicate returns 'UNKNOWN' or 'FALSE'. Either way, the row from 't1' is not included in the query result.

An example where subquery materialization is not used is the following query, where 't2.b' is a nullable column:

 SELECT * FROM t1
 WHERE (t1.a,t1.b) NOT IN (SELECT t2.a,t2.b FROM t2
                           WHERE WHERE_CONDITION);

The following restrictions apply to the use of subquery materialization:

Use of *note 'EXPLAIN': explain. with a query provides some indication of whether the optimizer uses subquery materialization:

 File: manual.info.tmp, Node: subquery-optimization-with-exists, Next: derived-table-optimization, Prev: subquery-materialization, Up: subquery-optimization

8.2.2.3 Optimizing Subqueries with the EXISTS Strategy ......................................................

Certain optimizations are applicable to comparisons that use the 'IN' (or '=ANY') operator to test subquery results. This section discusses these optimizations, particularly with regard to the challenges that 'NULL' values present. The last part of the discussion suggests how you can help the optimizer.

Consider the following subquery comparison:

 OUTER_EXPR IN (SELECT INNER_EXPR FROM ... WHERE SUBQUERY_WHERE)

MySQL evaluates queries 'from outside to inside.' That is, it first obtains the value of the outer expression OUTER_EXPR, and then runs the subquery and captures the rows that it produces.

A very useful optimization is to 'inform' the subquery that the only rows of interest are those where the inner expression INNER_EXPR is equal to OUTER_EXPR. This is done by pushing down an appropriate equality into the subquery's 'WHERE' clause to make it more restrictive. The converted comparison looks like this:

 EXISTS (SELECT 1 FROM ... WHERE SUBQUERY_WHERE AND OUTER_EXPR=INNER_EXPR)

After the conversion, MySQL can use the pushed-down equality to limit the number of rows it must examine to evaluate the subquery.

More generally, a comparison of N values to a subquery that returns N-value rows is subject to the same conversion. If OE_I and IE_I represent corresponding outer and inner expression values, this subquery comparison:

 (OE_1, ..., OE_N) IN
   (SELECT IE_1, ..., IE_N FROM ... WHERE SUBQUERY_WHERE)

Becomes:

 EXISTS (SELECT 1 FROM ... WHERE SUBQUERY_WHERE
                           AND OE_1 = IE_1
                           AND ...
                           AND OE_N = IE_N)

For simplicity, the following discussion assumes a single pair of outer and inner expression values.

The conversion just described has its limitations. It is valid only if we ignore possible 'NULL' values. That is, the 'pushdown' strategy works as long as both of these conditions are true:

When either or both of those conditions do not hold, optimization is more complex.

Suppose that OUTER_EXPR is known to be a non-'NULL' value but the subquery does not produce a row such that OUTER_EXPR = INNER_EXPR. Then 'OUTER_EXPR IN (SELECT ...)' evaluates as follows:

In this situation, the approach of looking for rows with 'OUTER_EXPR = INNER_EXPR' is no longer valid. It is necessary to look for such rows, but if none are found, also look for rows where INNER_EXPR is 'NULL'. Roughly speaking, the subquery can be converted to something like this:

 EXISTS (SELECT 1 FROM ... WHERE SUBQUERY_WHERE AND
         (OUTER_EXPR=INNER_EXPR OR INNER_EXPR IS NULL))

The need to evaluate the extra 'IS NULL' condition is why MySQL has the 'ref_or_null' access method:

 mysql> EXPLAIN
        SELECT OUTER_EXPR IN (SELECT t2.maybe_null_key
                              FROM t2, t3 WHERE ...)
        FROM t1;
 *************************** 1. row ***************************
            id: 1
   select_type: PRIMARY
         table: t1
 ...
 *************************** 2. row ***************************
            id: 2
   select_type: DEPENDENT SUBQUERY
         table: t2
          type: ref_or_null
 possible_keys: maybe_null_key
           key: maybe_null_key
       key_len: 5
           ref: func
          rows: 2
         Extra: Using where; Using index
 ...

The 'unique_subquery' and 'index_subquery' subquery-specific access methods also have 'or 'NULL'' variants.

The additional 'OR ... IS NULL' condition makes query execution slightly more complicated (and some optimizations within the subquery become inapplicable), but generally this is tolerable.

The situation is much worse when OUTER_EXPR can be 'NULL'. According to the SQL interpretation of 'NULL' as 'unknown value,' 'NULL IN (SELECT INNER_EXPR ...)' should evaluate to:

For proper evaluation, it is necessary to be able to check whether the *note 'SELECT': select. has produced any rows at all, so 'OUTER_EXPR = INNER_EXPR' cannot be pushed down into the subquery. This is a problem because many real world subqueries become very slow unless the equality can be pushed down.

Essentially, there must be different ways to execute the subquery depending on the value of OUTER_EXPR.

The optimizer chooses SQL compliance over speed, so it accounts for the possibility that OUTER_EXPR might be 'NULL':

To solve the dilemma of whether or not to push down conditions into the subquery, the conditions are wrapped within 'trigger' functions. Thus, an expression of the following form:

 OUTER_EXPR IN (SELECT INNER_EXPR FROM ... WHERE SUBQUERY_WHERE)

Is converted into:

 EXISTS (SELECT 1 FROM ... WHERE SUBQUERY_WHERE
                           AND trigcond(OUTER_EXPR=INNER_EXPR))

More generally, if the subquery comparison is based on several pairs of outer and inner expressions, the conversion takes this comparison:

 (OE_1, ..., OE_N) IN (SELECT IE_1, ..., IE_N FROM ... WHERE SUBQUERY_WHERE)

And converts it to this expression:

 EXISTS (SELECT 1 FROM ... WHERE SUBQUERY_WHERE
                           AND trigcond(OE_1=IE_1)
                           AND ...
                           AND trigcond(OE_N=IE_N)
        )

Each 'trigcond(X)' is a special function that evaluates to the following values:

Note:

Trigger functions are not triggers of the kind that you create with *note 'CREATE TRIGGER': create-trigger.

Equalities that are wrapped within 'trigcond()' functions are not first class predicates for the query optimizer. Most optimizations cannot deal with predicates that may be turned on and off at query execution time, so they assume any 'trigcond(X)' to be an unknown function and ignore it. Triggered equalities can be used by those optimizations:

When the optimizer uses a triggered condition to create some kind of index lookup-based access (as for the first two items of the preceding list), it must have a fallback strategy for the case when the condition is turned off. This fallback strategy is always the same: Do a full table scan. In *note 'EXPLAIN': explain. output, the fallback shows up as 'Full scan on NULL key' in the 'Extra' column:

 mysql> EXPLAIN SELECT t1.col1,
        t1.col1 IN (SELECT t2.key1 FROM t2 WHERE t2.col2=t1.col2) FROM t1\G
 *************************** 1. row ***************************
            id: 1
   select_type: PRIMARY
         table: t1
         ...
 *************************** 2. row ***************************
            id: 2
   select_type: DEPENDENT SUBQUERY
         table: t2
          type: index_subquery
 possible_keys: key1
           key: key1
       key_len: 5
           ref: func
          rows: 2
         Extra: Using where; Full scan on NULL key

If you run note 'EXPLAIN': explain. followed by note 'SHOW WARNINGS': show-warnings, you can see the triggered condition:

 *************************** 1. row ***************************
   Level: Note
    Code: 1003
 Message: select `test`.`t1`.`col1` AS `col1`,
          <in_optimizer>(`test`.`t1`.`col1`,
          <exists>(<index_lookup>(<cache>(`test`.`t1`.`col1`) in t2
          on key1 checking NULL
          where (`test`.`t2`.`col2` = `test`.`t1`.`col2`) having
          trigcond(<is_not_null_test>(`test`.`t2`.`key1`))))) AS
          `t1.col1 IN (select t2.key1 from t2 where t2.col2=t1.col2)`
          from `test`.`t1`

The use of triggered conditions has some performance implications. A 'NULL IN (SELECT ...)' expression now may cause a full table scan (which is slow) when it previously did not. This is the price paid for correct results (the goal of the trigger-condition strategy is to improve compliance, not speed).

For multiple-table subqueries, execution of 'NULL IN (SELECT ...)' is particularly slow because the join optimizer does not optimize for the case where the outer expression is 'NULL'. It assumes that subquery evaluations with 'NULL' on the left side are very rare, even if there are statistics that indicate otherwise. On the other hand, if the outer expression might be 'NULL' but never actually is, there is no performance penalty.

To help the query optimizer better execute your queries, use these suggestions:

The 'subquery_materialization_cost_based' flag of the 'optimizer_switch' system variable enables control over the choice between subquery materialization and 'IN'-to-'EXISTS' subquery transformation. See *note switchable-optimizations::.

 File: manual.info.tmp, Node: derived-table-optimization, Prev: subquery-optimization-with-exists, Up: subquery-optimization

8.2.2.4 Optimizing Derived Tables and View References with Merging or Materialization .....................................................................................

The optimizer can handle derived table references using two strategies (which also apply to view references):

Example 1:

 SELECT * FROM (SELECT * FROM t1) AS derived_t1;

With merging of the derived table 'derived_t1', that query is executed similar to:

 SELECT * FROM t1;

Example 2:

 SELECT *
   FROM t1 JOIN (SELECT t2.f1 FROM t2) AS derived_t2 ON t1.f2=derived_t2.f1
   WHERE t1.f1 > 0;

With merging of the derived table 'derived_t2', that query is executed similar to:

 SELECT t1.*, t2.f1
   FROM t1 JOIN t2 ON t1.f2=t2.f1
   WHERE t1.f1 > 0;

With materialization, 'derived_t1' and 'derived_t2' are each treated as a separate table within their respective queries.

The optimizer handles derived tables and view references the same way: It avoids unnecessary materialization whenever possible, which enables pushing down conditions from the outer query to derived tables and produces more efficient execution plans. (For an example, see *note subquery-materialization::.)

If merging would result in an outer query block that references more than 61 base tables, the optimizer chooses materialization instead.

The optimizer propagates an 'ORDER BY' clause in a derived table or view reference to the outer query block if these conditions are all true:

Otherwise, the optimizer ignores the 'ORDER BY' clause.

The following means are available to influence whether the optimizer attempts to merge derived tables and view references into the outer query block:

The 'derived_merge' flag also applies to views that contain no 'ALGORITHM' clause. Thus, if an 'ER_UPDATE_TABLE_USED' (https://dev.mysql.com/doc/mysql-errors/5.7/en/server-error-reference.html#error_er_update_table_used) error occurs for a view reference that uses an expression equivalent to the subquery, adding 'ALGORITHM=TEMPTABLE' to the view definition prevents merging and takes precedence over the current 'derived_merge' value.

If the optimizer chooses the materialization strategy rather than merging for a derived table, it handles the query as follows:

Consider the following note 'EXPLAIN': explain. statement, for a note 'SELECT': select. query that contains a derived table:

 EXPLAIN SELECT * FROM (SELECT * FROM t1) AS derived_t1;

The optimizer avoids materializing the derived table by delaying it until the result is needed during note 'SELECT': select. execution. In this case, the query is not executed (because it occurs in an note 'EXPLAIN': explain. statement), so the result is never needed.

Even for queries that are executed, delay of derived table materialization may enable the optimizer to avoid materialization entirely. When this happens, query execution is quicker by the time needed to perform materialization. Consider the following query, which joins the result of a derived table to another table:

 SELECT *
   FROM t1 JOIN (SELECT t2.f1 FROM t2) AS derived_t2
           ON t1.f2=derived_t2.f1
   WHERE t1.f1 > 0;

If the optimization processes 't1' first and the 'WHERE' clause produces an empty result, the join must necessarily be empty and the derived table need not be materialized.

For cases when a derived table requires materialization, the optimizer may add an index to the materialized table to speed up access to it. If such an index enables 'ref' access to the table, it can greatly reduce amount of data read during query execution. Consider the following query:

 SELECT *
  FROM t1 JOIN (SELECT DISTINCT f1 FROM t2) AS derived_t2
          ON t1.f1=derived_t2.f1;

The optimizer constructs an index over column 'f1' from 'derived_t2' if doing so would enable use of 'ref' access for the lowest cost execution plan. After adding the index, the optimizer can treat the materialized derived table the same as a regular table with an index, and it benefits similarly from the generated index. The overhead of index creation is negligible compared to the cost of query execution without the index. If 'ref' access would result in higher cost than some other access method, the optimizer creates no index and loses nothing.

For optimizer trace output, a merged derived table or view reference is not shown as a node. Only its underlying tables appear in the top query's plan.

 File: manual.info.tmp, Node: information-schema-optimization, Next: data-change-optimization, Prev: subquery-optimization, Up: statement-optimization

8.2.3 Optimizing INFORMATION_SCHEMA Queries

Applications that monitor databases may make frequent use of 'INFORMATION_SCHEMA' tables. Certain types of queries for 'INFORMATION_SCHEMA' tables can be optimized to execute more quickly. The goal is to minimize file operations (for example, scanning a directory or opening a table file) to collect the information that makes up these dynamic tables.

Note:

Comparison behavior for database and table names in 'INFORMATION_SCHEMA' queries might differ from what you expect. For details, see *note charset-collation-information-schema::.

1) Try to use constant lookup values for database and table names in the 'WHERE' clause

You can take advantage of this principle as follows:

This principle applies to the 'INFORMATION_SCHEMA' tables shown in the following table, which shows the columns for which a constant lookup value enables the server to avoid a directory scan. For example, if you are selecting from *note 'TABLES': information-schema-tables-table, using a constant lookup value for 'TABLE_SCHEMA' in the 'WHERE' clause enables a data directory scan to be avoided.

Table Column to specify to Column to specify to avoid data directory avoid database scan directory scan

*note 'COLUMNS': information-schema-columns-table.

'TABLE_SCHEMA' 'TABLE_NAME'

*note 'KEY_COLUMN_USAGE': information-schema-key-column-usage-table.

'TABLE_SCHEMA' 'TABLE_NAME'

*note 'PARTITIONS': information-schema-partitions-table.

'TABLE_SCHEMA' 'TABLE_NAME'

*note 'REFERENTIAL_CONSTRAINTS': information-schema-referential-constraints-table.

'CONSTRAINT_SCHEMA' 'TABLE_NAME'

*note 'STATISTICS': information-schema-statistics-table.

'TABLE_SCHEMA' 'TABLE_NAME'

*note 'TABLES': information-schema-tables-table.

'TABLE_SCHEMA' 'TABLE_NAME'

*note 'TABLE_CONSTRAINTS': information-schema-table-constraints-table.

'TABLE_SCHEMA' 'TABLE_NAME'

*note 'TRIGGERS': information-schema-triggers-table.

'EVENT_OBJECT_SCHEMA' 'EVENT_OBJECT_TABLE'

*note 'VIEWS': information-schema-views-table.

'TABLE_SCHEMA' 'TABLE_NAME'

The benefit of a query that is limited to a specific constant database name is that checks need be made only for the named database directory. Example:

 SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES
 WHERE TABLE_SCHEMA = 'test';

Use of the literal database name 'test' enables the server to check only the 'test' database directory, regardless of how many databases there might be. By contrast, the following query is less efficient because it requires a scan of the data directory to determine which database names match the pattern ''test%'':

 SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES
 WHERE TABLE_SCHEMA LIKE 'test%';

For a query that is limited to a specific constant table name, checks need be made only for the named table within the corresponding database directory. Example:

 SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES
 WHERE TABLE_SCHEMA = 'test' AND TABLE_NAME = 't1';

Use of the literal table name 't1' enables the server to check only the files for the 't1' table, regardless of how many tables there might be in the 'test' database. By contrast, the following query requires a scan of the 'test' database directory to determine which table names match the pattern ''t%'':

 SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES
 WHERE TABLE_SCHEMA = 'test' AND TABLE_NAME LIKE 't%';

The following query requires a scan of the database directory to determine matching database names for the pattern ''test%'', and for each matching database, it requires a scan of the database directory to determine matching table names for the pattern ''t%'':

 SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES
 WHERE TABLE_SCHEMA = 'test%' AND TABLE_NAME LIKE 't%';

2) Write queries that minimize the number of table files that must be opened

For queries that refer to certain 'INFORMATION_SCHEMA' table columns, several optimizations are available that minimize the number of table files that must be opened. Example:

 SELECT TABLE_NAME, ENGINE FROM INFORMATION_SCHEMA.TABLES
 WHERE TABLE_SCHEMA = 'test';

In this case, after the server has scanned the database directory to determine the names of the tables in the database, those names become available with no further file system lookups. Thus, 'TABLE_NAME' requires no files to be opened. The 'ENGINE' (storage engine) value can be determined by opening the table's '.frm' file, without touching other table files such as the '.MYD' or '.MYI' file.

Some values, such as 'INDEX_LENGTH' for 'MyISAM' tables, require opening the '.MYD' or '.MYI' file as well.

The file-opening optimization types are denoted thus:

The following list indicates how the preceding optimization types apply to 'INFORMATION_SCHEMA' table columns. For tables and columns not named, none of the optimizations apply.

3) Use note 'EXPLAIN': explain. to determine whether the server can use 'INFORMATION_SCHEMA' optimizations for a query*

This applies particularly for 'INFORMATION_SCHEMA' queries that search for information from more than one database, which might take a long time and impact performance. The 'Extra' value in *note 'EXPLAIN': explain. output indicates which, if any, of the optimizations described earlier the server can use to evaluate 'INFORMATION_SCHEMA' queries. The following examples demonstrate the kinds of information you can expect to see in the 'Extra' value.

 mysql> EXPLAIN SELECT TABLE_NAME FROM INFORMATION_SCHEMA.VIEWS WHERE
        TABLE_SCHEMA = 'test' AND TABLE_NAME = 'v1'\G
 *************************** 1. row ***************************
            id: 1
   select_type: SIMPLE
         table: VIEWS
          type: ALL
 possible_keys: NULL
           key: TABLE_SCHEMA,TABLE_NAME
       key_len: NULL
           ref: NULL
          rows: NULL
         Extra: Using where; Open_frm_only; Scanned 0 databases

Use of constant database and table lookup values enables the server to avoid directory scans. For references to 'VIEWS.TABLE_NAME', only the '.frm' file need be opened.

 mysql> EXPLAIN SELECT TABLE_NAME, ROW_FORMAT FROM INFORMATION_SCHEMA.TABLES\G
 *************************** 1. row ***************************
            id: 1
   select_type: SIMPLE
         table: TABLES
          type: ALL
 possible_keys: NULL
           key: NULL
       key_len: NULL
           ref: NULL
          rows: NULL
         Extra: Open_full_table; Scanned all databases

No lookup values are provided (there is no 'WHERE' clause), so the server must scan the data directory and each database directory. For each table thus identified, the table name and row format are selected. 'TABLE_NAME' requires no further table files to be opened (the 'SKIP_OPEN_TABLE' optimization applies). 'ROW_FORMAT' requires all table files to be opened ('OPEN_FULL_TABLE' applies). *note 'EXPLAIN': explain. reports 'OPEN_FULL_TABLE' because it is more expensive than 'SKIP_OPEN_TABLE'.

 mysql> EXPLAIN SELECT TABLE_NAME, TABLE_TYPE FROM INFORMATION_SCHEMA.TABLES
        WHERE TABLE_SCHEMA = 'test'\G
 *************************** 1. row ***************************
            id: 1
   select_type: SIMPLE
         table: TABLES
          type: ALL
 possible_keys: NULL
           key: TABLE_SCHEMA
       key_len: NULL
           ref: NULL
          rows: NULL
         Extra: Using where; Open_frm_only; Scanned 1 database

No table name lookup value is provided, so the server must scan the 'test' database directory. For the 'TABLE_NAME' and 'TABLE_TYPE' columns, the 'SKIP_OPEN_TABLE' and 'OPEN_FRM_ONLY' optimizations apply, respectively. *note 'EXPLAIN': explain. reports 'OPEN_FRM_ONLY' because it is more expensive.

 mysql> EXPLAIN SELECT B.TABLE_NAME
        FROM INFORMATION_SCHEMA.TABLES AS A, INFORMATION_SCHEMA.COLUMNS AS B
        WHERE A.TABLE_SCHEMA = 'test'
        AND A.TABLE_NAME = 't1'
        AND B.TABLE_NAME = A.TABLE_NAME\G
 *************************** 1. row ***************************
            id: 1
   select_type: SIMPLE
         table: A
          type: ALL
 possible_keys: NULL
           key: TABLE_SCHEMA,TABLE_NAME
       key_len: NULL
           ref: NULL
          rows: NULL
         Extra: Using where; Skip_open_table; Scanned 0 databases
 *************************** 2. row ***************************
            id: 1
   select_type: SIMPLE
         table: B
          type: ALL
 possible_keys: NULL
           key: NULL
       key_len: NULL
           ref: NULL
          rows: NULL
         Extra: Using where; Open_frm_only; Scanned all databases;
                Using join buffer

For the first *note 'EXPLAIN': explain. output row: Constant database and table lookup values enable the server to avoid directory scans for 'TABLES' values. References to 'TABLES.TABLE_NAME' require no further table files.

For the second note 'EXPLAIN': explain. output row: All note 'COLUMNS': information-schema-columns-table. table values are 'OPEN_FRM_ONLY' lookups, so 'COLUMNS.TABLE_NAME' requires the '.frm' file to be opened.

 mysql> EXPLAIN SELECT * FROM INFORMATION_SCHEMA.COLLATIONS\G
 *************************** 1. row ***************************
            id: 1
   select_type: SIMPLE
         table: COLLATIONS
          type: ALL
 possible_keys: NULL
           key: NULL
       key_len: NULL
           ref: NULL
          rows: NULL
         Extra:

In this case, no optimizations apply because *note 'COLLATIONS': information-schema-collations-table. is not one of the 'INFORMATION_SCHEMA' tables for which optimizations are available.

 File: manual.info.tmp, Node: data-change-optimization, Next: permission-optimization, Prev: information-schema-optimization, Up: statement-optimization

8.2.4 Optimizing Data Change Statements

This section explains how to speed up data change statements: note 'INSERT': insert, note 'UPDATE': update, and note 'DELETE': delete. Traditional OLTP applications and modern web applications typically do many small data change operations, where concurrency is vital. Data analysis and reporting applications typically run data change operations that affect many rows at once, where the main considerations is the I/O to write large amounts of data and keep indexes up-to-date. For inserting and updating large volumes of data (known in the industry as ETL, for 'extract-transform-load'), sometimes you use other SQL statements or external commands, that mimic the effects of note 'INSERT': insert, note 'UPDATE': update, and note 'DELETE': delete. statements.

 File: manual.info.tmp, Node: insert-optimization, Next: update-optimization, Prev: data-change-optimization, Up: data-change-optimization

8.2.4.1 Optimizing INSERT Statements ....................................

To optimize insert speed, combine many small operations into a single large operation. Ideally, you make a single connection, send the data for many new rows at once, and delay all index updates and consistency checking until the very end.

The time required for inserting a row is determined by the following factors, where the numbers indicate approximate proportions:

This does not take into consideration the initial overhead to open tables, which is done once for each concurrently running query.

The size of the table slows down the insertion of indexes by log N, assuming B-tree indexes.

You can use the following methods to speed up inserts:

 File: manual.info.tmp, Node: update-optimization, Next: delete-optimization, Prev: insert-optimization, Up: data-change-optimization

8.2.4.2 Optimizing UPDATE Statements ....................................

An update statement is optimized like a *note 'SELECT': select. query with the additional overhead of a write. The speed of the write depends on the amount of data being updated and the number of indexes that are updated. Indexes that are not changed do not get updated.

Another way to get fast updates is to delay updates and then do many updates in a row later. Performing multiple updates together is much quicker than doing one at a time if you lock the table.

For a 'MyISAM' table that uses dynamic row format, updating a row to a longer total length may split the row. If you do this often, it is very important to use note 'OPTIMIZE TABLE': optimize-table. occasionally. See note optimize-table::.

 File: manual.info.tmp, Node: delete-optimization, Prev: update-optimization, Up: data-change-optimization

8.2.4.3 Optimizing DELETE Statements ....................................

The time required to delete individual rows in a 'MyISAM' table is exactly proportional to the number of indexes. To delete rows more quickly, you can increase the size of the key cache by increasing the 'key_buffer_size' system variable. See *note server-configuration::.

To delete all rows from a 'MyISAM' table, 'TRUNCATE TABLE TBL_NAME' is faster than 'DELETE FROM TBL_NAME'. Truncate operations are not transaction-safe; an error occurs when attempting one in the course of an active transaction or active table lock. See *note truncate-table::.

 File: manual.info.tmp, Node: permission-optimization, Next: miscellaneous-optimization-tips, Prev: data-change-optimization, Up: statement-optimization

8.2.5 Optimizing Database Privileges

The more complex your privilege setup, the more overhead applies to all SQL statements. Simplifying the privileges established by *note 'GRANT': grant. statements enables MySQL to reduce permission-checking overhead when clients execute statements. For example, if you do not grant any table-level or column-level privileges, the server need not ever check the contents of the 'tables_priv' and 'columns_priv' tables. Similarly, if you place no resource limits on any accounts, the server does not have to perform resource counting. If you have a very high statement-processing load, consider using a simplified grant structure to reduce permission-checking overhead.

 File: manual.info.tmp, Node: miscellaneous-optimization-tips, Prev: permission-optimization, Up: statement-optimization

8.2.6 Other Optimization Tips

This section lists a number of miscellaneous tips for improving query processing speed:

 File: manual.info.tmp, Node: optimization-indexes, Next: optimizing-database-structure, Prev: statement-optimization, Up: optimization