8.3 Optimization and Indexes
The best way to improve the performance of *note 'SELECT': select. operations is to create indexes on one or more of the columns that are tested in the query. The index entries act like pointers to the table rows, allowing the query to quickly determine which rows match a condition in the 'WHERE' clause, and retrieve the other column values for those rows. All MySQL data types can be indexed.
Although it can be tempting to create an indexes for every possible column used in a query, unnecessary indexes waste space and waste time for MySQL to determine which indexes to use. Indexes also add to the cost of inserts, updates, and deletes because each index must be updated. You must find the right balance to achieve fast queries using the optimal set of indexes.
File: manual.info.tmp, Node: mysql-indexes, Next: primary-key-optimization, Prev: optimization-indexes, Up: optimization-indexes
8.3.1 How MySQL Uses Indexes
Indexes are used to find rows with specific column values quickly. Without an index, MySQL must begin with the first row and then read through the entire table to find the relevant rows. The larger the table, the more this costs. If the table has an index for the columns in question, MySQL can quickly determine the position to seek to in the middle of the data file without having to look at all the data. This is much faster than reading every row sequentially.
Most MySQL indexes ('PRIMARY KEY', 'UNIQUE', 'INDEX', and 'FULLTEXT') are stored in B-trees. Exceptions: Indexes on spatial data types use R-trees; 'MEMORY' tables also support hash indexes; 'InnoDB' uses inverted lists for 'FULLTEXT' indexes.
In general, indexes are used as described in the following discussion. Characteristics specific to hash indexes (as used in 'MEMORY' tables) are described in *note index-btree-hash::.
MySQL uses indexes for these operations:
To find the rows matching a 'WHERE' clause quickly.
To eliminate rows from consideration. If there is a choice between multiple indexes, MySQL normally uses the index that finds the smallest number of rows (the most selective index).
If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to look up rows. For example, if you have a three-column index on '(col1, col2, col3)', you have indexed search capabilities on '(col1)', '(col1, col2)', and '(col1, col2, col3)'. For more information, see *note multiple-column-indexes::.
To retrieve rows from other tables when performing joins. MySQL can use indexes on columns more efficiently if they are declared as the same type and size. In this context, note 'VARCHAR': char. and note 'CHAR': char. are considered the same if they are declared as the same size. For example, 'VARCHAR(10)' and 'CHAR(10)' are the same size, but 'VARCHAR(10)' and 'CHAR(15)' are not.
For comparisons between nonbinary string columns, both columns should use the same character set. For example, comparing a 'utf8' column with a 'latin1' column precludes use of an index.
Comparison of dissimilar columns (comparing a string column to a temporal or numeric column, for example) may prevent use of indexes if values cannot be compared directly without conversion. For a given value such as '1' in the numeric column, it might compare equal to any number of values in the string column such as ''1'', '' 1'', ''00001'', or ''01.e1''. This rules out use of any indexes for the string column.
To find the 'MIN()' or 'MAX()' value for a specific indexed column KEY_COL. This is optimized by a preprocessor that checks whether you are using 'WHERE KEY_PART_N = CONSTANT' on all key parts that occur before KEY_COL in the index. In this case, MySQL does a single key lookup for each 'MIN()' or 'MAX()' expression and replaces it with a constant. If all expressions are replaced with constants, the query returns at once. For example:
SELECT MIN(KEY_PART2),MAX(KEY_PART2)
FROM TBL_NAME WHERE KEY_PART1=10;
To sort or group a table if the sorting or grouping is done on a leftmost prefix of a usable index (for example, 'ORDER BY KEY_PART1, KEY_PART2'). If all key parts are followed by 'DESC', the key is read in reverse order. See note order-by-optimization::, and note group-by-optimization::.
In some cases, a query can be optimized to retrieve values without consulting the data rows. (An index that provides all the necessary results for a query is called a covering index.) If a query uses from a table only columns that are included in some index, the selected values can be retrieved from the index tree for greater speed:
SELECT KEY_PART3 FROM TBL_NAME
WHERE KEY_PART1=1
Indexes are less important for queries on small tables, or big tables where report queries process most or all of the rows. When a query needs to access most of the rows, reading sequentially is faster than working through an index. Sequential reads minimize disk seeks, even if not all the rows are needed for the query. See *note table-scan-avoidance:: for details.
File: manual.info.tmp, Node: primary-key-optimization, Next: foreign-key-optimization, Prev: mysql-indexes, Up: optimization-indexes
8.3.2 Primary Key Optimization
The primary key for a table represents the column or set of columns that you use in your most vital queries. It has an associated index, for fast query performance. Query performance benefits from the 'NOT NULL' optimization, because it cannot include any 'NULL' values. With the 'InnoDB' storage engine, the table data is physically organized to do ultra-fast lookups and sorts based on the primary key column or columns.
If your table is big and important, but does not have an obvious column or set of columns to use as a primary key, you might create a separate column with auto-increment values to use as the primary key. These unique IDs can serve as pointers to corresponding rows in other tables when you join tables using foreign keys.
File: manual.info.tmp, Node: foreign-key-optimization, Next: column-indexes, Prev: primary-key-optimization, Up: optimization-indexes
8.3.3 Foreign Key Optimization
If a table has many columns, and you query many different combinations of columns, it might be efficient to split the less-frequently used data into separate tables with a few columns each, and relate them back to the main table by duplicating the numeric ID column from the main table. That way, each small table can have a primary key for fast lookups of its data, and you can query just the set of columns that you need using a join operation. Depending on how the data is distributed, the queries might perform less I/O and take up less cache memory because the relevant columns are packed together on disk. (To maximize performance, queries try to read as few data blocks as possible from disk; tables with only a few columns can fit more rows in each data block.)
File: manual.info.tmp, Node: column-indexes, Next: multiple-column-indexes, Prev: foreign-key-optimization, Up: optimization-indexes
8.3.4 Column Indexes
The most common type of index involves a single column, storing copies of the values from that column in a data structure, allowing fast lookups for the rows with the corresponding column values. The B-tree data structure lets the index quickly find a specific value, a set of values, or a range of values, corresponding to operators such as '=', '>', '<=', 'BETWEEN', 'IN', and so on, in a 'WHERE' clause.
The maximum number of indexes per table and the maximum index length is defined per storage engine. See note innodb-storage-engine::, and note storage-engines::. All storage engines support at least 16 indexes per table and a total index length of at least 256 bytes. Most storage engines have higher limits.
For additional information about column indexes, see *note create-index::.
*note column-indexes-prefix::
*note column-indexes-fulltext::
*note column-indexes-spatial::
*note column-indexes-memory-storage-engine::
Index Prefixes
With 'COL_NAME(N)' syntax in an index specification for a string column, you can create an index that uses only the first N characters of the column. Indexing only a prefix of column values in this way can make the index file much smaller. When you index a note 'BLOB': blob. or note 'TEXT': blob. column, you must specify a prefix length for the index. For example:
CREATE TABLE test (blob_col BLOB, INDEX(blob_col(10)));
Prefixes can be up to 1000 bytes long (767 bytes for 'InnoDB' tables, unless you have 'innodb_large_prefix' set).
Note:
Prefix limits are measured in bytes, whereas the prefix length in note 'CREATE TABLE': create-table, note 'ALTER TABLE': alter-table, and note 'CREATE INDEX': create-index. statements is interpreted as number of characters for nonbinary string types (note 'CHAR': char, note 'VARCHAR': char, note 'TEXT': blob.) and number of bytes for binary string types (note 'BINARY': binary-varbinary, note 'VARBINARY': binary-varbinary, *note 'BLOB': blob.). Take this into account when specifying a prefix length for a nonbinary string column that uses a multibyte character set.
If a search term exceeds the index prefix length, the index is used to exclude non-matching rows, and the remaining rows are examined for possible matches.
For additional information about index prefixes, see *note create-index::.
FULLTEXT Indexes
'FULLTEXT' indexes are used for full-text searches. Only the note 'InnoDB': innodb-storage-engine. and note 'MyISAM': myisam-storage-engine. storage engines support 'FULLTEXT' indexes and only for note 'CHAR': char, note 'VARCHAR': char, and note 'TEXT': blob. columns. Indexing always takes place over the entire column and column prefix indexing is not supported. For details, see note fulltext-search::.
Optimizations are applied to certain kinds of 'FULLTEXT' queries against single 'InnoDB' tables. Queries with these characteristics are particularly efficient:
'FULLTEXT' queries that only return the document ID, or the document ID and the search rank.
'FULLTEXT' queries that sort the matching rows in descending order of score and apply a 'LIMIT' clause to take the top N matching rows. For this optimization to apply, there must be no 'WHERE' clauses and only a single 'ORDER BY' clause in descending order.
'FULLTEXT' queries that retrieve only the 'COUNT(*)' value of rows matching a search term, with no additional 'WHERE' clauses. Code the 'WHERE' clause as 'WHERE MATCH(TEXT) AGAINST ('OTHER_TEXT')', without any '> 0' comparison operator.
For queries that contain full-text expressions, MySQL evaluates those expressions during the optimization phase of query execution. The optimizer does not just look at full-text expressions and make estimates, it actually evaluates them in the process of developing an execution plan.
An implication of this behavior is that *note 'EXPLAIN': explain. for full-text queries is typically slower than for non-full-text queries for which no expression evaluation occurs during the optimization phase.
*note 'EXPLAIN': explain. for full-text queries may show 'Select tables optimized away' in the 'Extra' column due to matching occurring during optimization; in this case, no table access need occur during later execution.
Spatial Indexes
You can create indexes on spatial data types. 'MyISAM' and 'InnoDB' support R-tree indexes on spatial types. Other storage engines use B-trees for indexing spatial types (except for 'ARCHIVE', which does not support spatial type indexing).
Indexes in the MEMORY Storage Engine
The 'MEMORY' storage engine uses 'HASH' indexes by default, but also supports 'BTREE' indexes.
File: manual.info.tmp, Node: multiple-column-indexes, Next: verifying-index-usage, Prev: column-indexes, Up: optimization-indexes
8.3.5 Multiple-Column Indexes
MySQL can create composite indexes (that is, indexes on multiple columns). An index may consist of up to 16 columns. For certain data types, you can index a prefix of the column (see *note column-indexes::).
MySQL can use multiple-column indexes for queries that test all the columns in the index, or queries that test just the first column, the first two columns, the first three columns, and so on. If you specify the columns in the right order in the index definition, a single composite index can speed up several kinds of queries on the same table.
A multiple-column index can be considered a sorted array, the rows of which contain values that are created by concatenating the values of the indexed columns.
Note:
As an alternative to a composite index, you can introduce a column that is 'hashed' based on information from other columns. If this column is short, reasonably unique, and indexed, it might be faster than a 'wide' index on many columns. In MySQL, it is very easy to use this extra column:
SELECT * FROM TBL_NAME
WHERE HASH_COL=MD5(CONCAT(VAL1,VAL2))
AND COL1=VAL1 AND COL2=VAL2;
Suppose that a table has the following specification:
CREATE TABLE test (
id INT NOT NULL,
last_name CHAR(30) NOT NULL,
first_name CHAR(30) NOT NULL,
PRIMARY KEY (id),
INDEX name (last_name,first_name)
);
The 'name' index is an index over the 'last_name' and 'first_name' columns. The index can be used for lookups in queries that specify values in a known range for combinations of 'last_name' and 'first_name' values. It can also be used for queries that specify just a 'last_name' value because that column is a leftmost prefix of the index (as described later in this section). Therefore, the 'name' index is used for lookups in the following queries:
SELECT * FROM test WHERE last_name='Jones';
SELECT * FROM test
WHERE last_name='Jones' AND first_name='John';
SELECT * FROM test
WHERE last_name='Jones'
AND (first_name='John' OR first_name='Jon');
SELECT * FROM test
WHERE last_name='Jones'
AND first_name >='M' AND first_name < 'N';
However, the 'name' index is not used for lookups in the following queries:
SELECT * FROM test WHERE first_name='John';
SELECT * FROM test
WHERE last_name='Jones' OR first_name='John';
Suppose that you issue the following *note 'SELECT': select. statement:
SELECT * FROM TBL_NAME
WHERE col1=VAL1 AND col2=VAL2;
If a multiple-column index exists on 'col1' and 'col2', the appropriate rows can be fetched directly. If separate single-column indexes exist on 'col1' and 'col2', the optimizer attempts to use the Index Merge optimization (see *note index-merge-optimization::), or attempts to find the most restrictive index by deciding which index excludes more rows and using that index to fetch the rows.
If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to look up rows. For example, if you have a three-column index on '(col1, col2, col3)', you have indexed search capabilities on '(col1)', '(col1, col2)', and '(col1, col2, col3)'.
MySQL cannot use the index to perform lookups if the columns do not form a leftmost prefix of the index. Suppose that you have the *note 'SELECT': select. statements shown here:
SELECT * FROM TBL_NAME WHERE col1=VAL1;
SELECT * FROM TBL_NAME WHERE col1=VAL1 AND col2=VAL2;
SELECT * FROM TBL_NAME WHERE col2=VAL2;
SELECT * FROM TBL_NAME WHERE col2=VAL2 AND col3=VAL3;
If an index exists on '(col1, col2, col3)', only the first two queries use the index. The third and fourth queries do involve indexed columns, but do not use an index to perform lookups because '(col2)' and '(col2, col3)' are not leftmost prefixes of '(col1, col2, col3)'.
File: manual.info.tmp, Node: verifying-index-usage, Next: index-statistics, Prev: multiple-column-indexes, Up: optimization-indexes
8.3.6 Verifying Index Usage
Always check whether all your queries really use the indexes that you have created in the tables. Use the note 'EXPLAIN': explain. statement, as described in note using-explain::.
File: manual.info.tmp, Node: index-statistics, Next: index-btree-hash, Prev: verifying-index-usage, Up: optimization-indexes
8.3.7 InnoDB and MyISAM Index Statistics Collection
Storage engines collect statistics about tables for use by the optimizer. Table statistics are based on value groups, where a value group is a set of rows with the same key prefix value. For optimizer purposes, an important statistic is the average value group size.
MySQL uses the average value group size in the following ways:
To estimate how many rows must be read for each 'ref' access
To estimate how many rows a partial join produces; that is, the number of rows that an operation of this form produces:
(...) JOIN TBL_NAME ON TBL_NAME.KEY = EXPR
As the average value group size for an index increases, the index is less useful for those two purposes because the average number of rows per lookup increases: For the index to be good for optimization purposes, it is best that each index value target a small number of rows in the table. When a given index value yields a large number of rows, the index is less useful and MySQL is less likely to use it.
The average value group size is related to table cardinality, which is the number of value groups. The *note 'SHOW INDEX': show-index. statement displays a cardinality value based on N/S, where N is the number of rows in the table and S is the average value group size. That ratio yields an approximate number of value groups in the table.
For a join based on the '<=>' comparison operator, 'NULL' is not treated differently from any other value: 'NULL <=> NULL', just as 'N <=> N' for any other N.
However, for a join based on the '=' operator, 'NULL' is different from non-'NULL' values: 'EXPR1 = EXPR2' is not true when EXPR1 or EXPR2 (or both) are 'NULL'. This affects 'ref' accesses for comparisons of the form 'TBL_NAME.KEY = EXPR': MySQL does not access the table if the current value of EXPR is 'NULL', because the comparison cannot be true.
For '=' comparisons, it does not matter how many 'NULL' values are in the table. For optimization purposes, the relevant value is the average size of the non-'NULL' value groups. However, MySQL does not currently enable that average size to be collected or used.
For 'InnoDB' and 'MyISAM' tables, you have some control over collection of table statistics by means of the 'innodb_stats_method' and 'myisam_stats_method' system variables, respectively. These variables have three possible values, which differ as follows:
When the variable is set to 'nulls_equal', all 'NULL' values are treated as identical (that is, they all form a single value group).
If the 'NULL' value group size is much higher than the average non-'NULL' value group size, this method skews the average value group size upward. This makes index appear to the optimizer to be less useful than it really is for joins that look for non-'NULL' values. Consequently, the 'nulls_equal' method may cause the optimizer not to use the index for 'ref' accesses when it should.
When the variable is set to 'nulls_unequal', 'NULL' values are not considered the same. Instead, each 'NULL' value forms a separate value group of size 1.
If you have many 'NULL' values, this method skews the average value group size downward. If the average non-'NULL' value group size is large, counting 'NULL' values each as a group of size 1 causes the optimizer to overestimate the value of the index for joins that look for non-'NULL' values. Consequently, the 'nulls_unequal' method may cause the optimizer to use this index for 'ref' lookups when other methods may be better.
When the variable is set to 'nulls_ignored', 'NULL' values are ignored.
If you tend to use many joins that use '<=>' rather than '=', 'NULL' values are not special in comparisons and one 'NULL' is equal to another. In this case, 'nulls_equal' is the appropriate statistics method.
The 'innodb_stats_method' system variable has a global value; the 'myisam_stats_method' system variable has both global and session values. Setting the global value affects statistics collection for tables from the corresponding storage engine. Setting the session value affects statistics collection only for the current client connection. This means that you can force a table's statistics to be regenerated with a given method without affecting other clients by setting the session value of 'myisam_stats_method'.
To regenerate 'MyISAM' table statistics, you can use any of the following methods:
Execute *note 'myisamchk --stats_method=METHOD_NAME --analyze': myisamchk.
Change the table to cause its statistics to go out of date (for example, insert a row and then delete it), and then set 'myisam_stats_method' and issue an *note 'ANALYZE TABLE': analyze-table. statement
Some caveats regarding the use of 'innodb_stats_method' and 'myisam_stats_method':
You can force table statistics to be collected explicitly, as just described. However, MySQL may also collect statistics automatically. For example, if during the course of executing statements for a table, some of those statements modify the table, MySQL may collect statistics. (This may occur for bulk inserts or deletes, or some *note 'ALTER TABLE': alter-table. statements, for example.) If this happens, the statistics are collected using whatever value 'innodb_stats_method' or 'myisam_stats_method' has at the time. Thus, if you collect statistics using one method, but the system variable is set to the other method when a table's statistics are collected automatically later, the other method is used.
There is no way to tell which method was used to generate statistics for a given table.
These variables apply only to 'InnoDB' and 'MyISAM' tables. Other storage engines have only one method for collecting table statistics. Usually it is closer to the 'nulls_equal' method.
File: manual.info.tmp, Node: index-btree-hash, Next: index-extensions, Prev: index-statistics, Up: optimization-indexes
8.3.8 Comparison of B-Tree and Hash Indexes
Understanding the B-tree and hash data structures can help predict how different queries perform on different storage engines that use these data structures in their indexes, particularly for the 'MEMORY' storage engine that lets you choose B-tree or hash indexes.
B-Tree Index Characteristics
A B-tree index can be used for column comparisons in expressions that use the '=', '>', '>=', '<', '<=', or 'BETWEEN' operators. The index also can be used for 'LIKE' comparisons if the argument to 'LIKE' is a constant string that does not start with a wildcard character. For example, the following *note 'SELECT': select. statements use indexes:
SELECT * FROM TBL_NAME WHERE KEY_COL LIKE 'Patrick%';
SELECT * FROM TBL_NAME WHERE KEY_COL LIKE 'Pat%_ck%';
In the first statement, only rows with ''Patrick' <= KEY_COL < 'Patricl'' are considered. In the second statement, only rows with ''Pat' <= KEY_COL < 'Pau'' are considered.
The following *note 'SELECT': select. statements do not use indexes:
SELECT * FROM TBL_NAME WHERE KEY_COL LIKE '%Patrick%';
SELECT * FROM TBL_NAME WHERE KEY_COL LIKE OTHER_COL;
In the first statement, the 'LIKE' value begins with a wildcard character. In the second statement, the 'LIKE' value is not a constant.
If you use '... LIKE '%STRING%'' and STRING is longer than three characters, MySQL uses the Turbo Boyer-Moore algorithm to initialize the pattern for the string and then uses this pattern to perform the search more quickly.
A search using 'COL_NAME IS NULL' employs indexes if COL_NAME is indexed.
Any index that does not span all 'AND' levels in the 'WHERE' clause is not used to optimize the query. In other words, to be able to use an index, a prefix of the index must be used in every 'AND' group.
The following 'WHERE' clauses use indexes:
... WHERE INDEX_PART1=1 AND INDEX_PART2=2 AND OTHER_COLUMN=3
/* INDEX = 1 OR INDEX = 2 */
... WHERE INDEX=1 OR A=10 AND INDEX=2
/* optimized like "INDEX_PART1='hello'" */
... WHERE INDEX_PART1='hello' AND INDEX_PART3=5
/* Can use index on INDEX1 but not on INDEX2 or INDEX3 */
... WHERE INDEX1=1 AND INDEX2=2 OR INDEX1=3 AND INDEX3=3;
These 'WHERE' clauses do not use indexes:
/* INDEX_PART1 is not used */
... WHERE INDEX_PART2=1 AND INDEX_PART3=2
/* Index is not used in both parts of the WHERE clause */
... WHERE INDEX=1 OR A=10
/* No index spans all rows */
... WHERE INDEX_PART1=1 OR INDEX_PART2=10
Sometimes MySQL does not use an index, even if one is available. One circumstance under which this occurs is when the optimizer estimates that using the index would require MySQL to access a very large percentage of the rows in the table. (In this case, a table scan is likely to be much faster because it requires fewer seeks.) However, if such a query uses 'LIMIT' to retrieve only some of the rows, MySQL uses an index anyway, because it can much more quickly find the few rows to return in the result.
Hash Index Characteristics
Hash indexes have somewhat different characteristics from those just discussed:
They are used only for equality comparisons that use the '=' or '<=>' operators (but are very fast). They are not used for comparison operators such as '<' that find a range of values. Systems that rely on this type of single-value lookup are known as 'key-value stores'; to use MySQL for such applications, use hash indexes wherever possible.
The optimizer cannot use a hash index to speed up 'ORDER BY' operations. (This type of index cannot be used to search for the next entry in order.)
MySQL cannot determine approximately how many rows there are between two values (this is used by the range optimizer to decide which index to use). This may affect some queries if you change a 'MyISAM' or 'InnoDB' table to a hash-indexed 'MEMORY' table.
Only whole keys can be used to search for a row. (With a B-tree index, any leftmost prefix of the key can be used to find rows.)
File: manual.info.tmp, Node: index-extensions, Next: generated-column-index-optimizations, Prev: index-btree-hash, Up: optimization-indexes
8.3.9 Use of Index Extensions
*note 'InnoDB': innodb-storage-engine. automatically extends each secondary index by appending the primary key columns to it. Consider this table definition:
CREATE TABLE t1 (
i1 INT NOT NULL DEFAULT 0,
i2 INT NOT NULL DEFAULT 0,
d DATE DEFAULT NULL,
PRIMARY KEY (i1, i2),
INDEX k_d (d)
) ENGINE = InnoDB;
This table defines the primary key on columns '(i1, i2)'. It also defines a secondary index 'k_d' on column '(d)', but internally 'InnoDB' extends this index and treats it as columns '(d, i1, i2)'.
The optimizer takes into account the primary key columns of the extended secondary index when determining how and whether to use that index. This can result in more efficient query execution plans and better performance.
The optimizer can use extended secondary indexes for 'ref', 'range', and 'index_merge' index access, for Loose Index Scan access, for join and sorting optimization, and for 'MIN()'/'MAX()' optimization.
The following example shows how execution plans are affected by whether the optimizer uses extended secondary indexes. Suppose that 't1' is populated with these rows:
INSERT INTO t1 VALUES
(1, 1, '1998-01-01'), (1, 2, '1999-01-01'),
(1, 3, '2000-01-01'), (1, 4, '2001-01-01'),
(1, 5, '2002-01-01'), (2, 1, '1998-01-01'),
(2, 2, '1999-01-01'), (2, 3, '2000-01-01'),
(2, 4, '2001-01-01'), (2, 5, '2002-01-01'),
(3, 1, '1998-01-01'), (3, 2, '1999-01-01'),
(3, 3, '2000-01-01'), (3, 4, '2001-01-01'),
(3, 5, '2002-01-01'), (4, 1, '1998-01-01'),
(4, 2, '1999-01-01'), (4, 3, '2000-01-01'),
(4, 4, '2001-01-01'), (4, 5, '2002-01-01'),
(5, 1, '1998-01-01'), (5, 2, '1999-01-01'),
(5, 3, '2000-01-01'), (5, 4, '2001-01-01'),
(5, 5, '2002-01-01');
Now consider this query:
EXPLAIN SELECT COUNT(*) FROM t1 WHERE i1 = 3 AND d = '2000-01-01'
The execution plan depends on whether the extended index is used.
When the optimizer does not consider index extensions, it treats the index 'k_d' as only '(d)'. *note 'EXPLAIN': explain. for the query produces this result:
mysql> EXPLAIN SELECT COUNT(*) FROM t1 WHERE i1 = 3 AND d = '2000-01-01'\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t1
type: ref
possible_keys: PRIMARY,k_d
key: k_d
key_len: 4
ref: const
rows: 5
Extra: Using where; Using index
When the optimizer takes index extensions into account, it treats 'k_d' as '(d, i1, i2)'. In this case, it can use the leftmost index prefix '(d, i1)' to produce a better execution plan:
mysql> EXPLAIN SELECT COUNT(*) FROM t1 WHERE i1 = 3 AND d = '2000-01-01'\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t1
type: ref
possible_keys: PRIMARY,k_d
key: k_d
key_len: 8
ref: const,const
rows: 1
Extra: Using index
In both cases, 'key' indicates that the optimizer uses secondary index 'k_d' but the *note 'EXPLAIN': explain. output shows these improvements from using the extended index:
'key_len' goes from 4 bytes to 8 bytes, indicating that key lookups use columns 'd' and 'i1', not just 'd'.
The 'ref' value changes from 'const' to 'const,const' because the key lookup uses two key parts, not one.
The 'rows' count decreases from 5 to 1, indicating that 'InnoDB' should need to examine fewer rows to produce the result.
The 'Extra' value changes from 'Using where; Using index' to 'Using index'. This means that rows can be read using only the index, without consulting columns in the data row.
Differences in optimizer behavior for use of extended indexes can also be seen with *note 'SHOW STATUS': show-status.:
FLUSH TABLE t1;
FLUSH STATUS;
SELECT COUNT(*) FROM t1 WHERE i1 = 3 AND d = '2000-01-01';
SHOW STATUS LIKE 'handler_read%'
The preceding statements include 'FLUSH TABLES' and 'FLUSH STATUS' to flush the table cache and clear the status counters.
Without index extensions, *note 'SHOW STATUS': show-status. produces this result:
+-----------------------+-------+
| Variable_name | Value |
+-----------------------+-------+
| Handler_read_first | 0 |
| Handler_read_key | 1 |
| Handler_read_last | 0 |
| Handler_read_next | 5 |
| Handler_read_prev | 0 |
| Handler_read_rnd | 0 |
| Handler_read_rnd_next | 0 |
+-----------------------+-------+
With index extensions, *note 'SHOW STATUS': show-status. produces this result. The 'Handler_read_next' value decreases from 5 to 1, indicating more efficient use of the index:
+-----------------------+-------+
| Variable_name | Value |
+-----------------------+-------+
| Handler_read_first | 0 |
| Handler_read_key | 1 |
| Handler_read_last | 0 |
| Handler_read_next | 1 |
| Handler_read_prev | 0 |
| Handler_read_rnd | 0 |
| Handler_read_rnd_next | 0 |
+-----------------------+-------+
The 'use_index_extensions' flag of the 'optimizer_switch' system variable permits control over whether the optimizer takes the primary key columns into account when determining how to use an 'InnoDB' table's secondary indexes. By default, 'use_index_extensions' is enabled. To check whether disabling use of index extensions improves performance, use this statement:
SET optimizer_switch = 'use_index_extensions=off';
Use of index extensions by the optimizer is subject to the usual limits on the number of key parts in an index (16) and the maximum key length (3072 bytes).
File: manual.info.tmp, Node: generated-column-index-optimizations, Next: timestamp-lookups, Prev: index-extensions, Up: optimization-indexes
8.3.10 Optimizer Use of Generated Column Indexes
MySQL supports indexes on generated columns. For example:
CREATE TABLE t1 (f1 INT, gc INT AS (f1 + 1) STORED, INDEX (gc));
The generated column, 'gc', is defined as the expression 'f1 + 1'. The column is also indexed and the optimizer can take that index into account during execution plan construction. In the following query, the 'WHERE' clause refers to 'gc' and the optimizer considers whether the index on that column yields a more efficient plan:
SELECT * FROM t1 WHERE gc > 9;
The optimizer can use indexes on generated columns to generate execution plans, even in the absence of direct references in queries to those columns by name. This occurs if the 'WHERE', 'ORDER BY', or 'GROUP BY' clause refers to an expression that matches the definition of some indexed generated column. The following query does not refer directly to 'gc' but does use an expression that matches the definition of 'gc':
SELECT * FROM t1 WHERE f1 + 1 > 9;
The optimizer recognizes that the expression 'f1 + 1' matches the definition of 'gc' and that 'gc' is indexed, so it considers that index during execution plan construction. You can see this using *note 'EXPLAIN': explain.:
mysql> EXPLAIN SELECT * FROM t1 WHERE f1 + 1 > 9\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: t1
partitions: NULL
type: range
possible_keys: gc
key: gc
key_len: 5
ref: NULL
rows: 1
filtered: 100.00
Extra: Using index condition
In effect, the optimizer has replaced the expression 'f1 + 1' with the name of the generated column that matches the expression. That is also apparent in the rewritten query available in the extended note 'EXPLAIN': explain. information displayed by note 'SHOW WARNINGS': show-warnings.:
mysql> SHOW WARNINGS\G
*************************** 1. row ***************************
Level: Note
Code: 1003
Message: /* select#1 */ select `test`.`t1`.`f1` AS `f1`,`test`.`t1`.`gc`
AS `gc` from `test`.`t1` where (`test`.`t1`.`gc` > 9)
The following restrictions and conditions apply to the optimizer's use of generated column indexes:
For a query expression to match a generated column definition, the expression must be identical and it must have the same result type. For example, if the generated column expression is 'f1 + 1', the optimizer does not recognize a match if the query uses '1 + f1', or if 'f1 + 1' (an integer expression) is compared with a string.
The optimization applies to these operators: '=', '<', '<=', '>', '>=', 'BETWEEN', and 'IN()'.
For operators other than 'BETWEEN' and 'IN()', either operand can be replaced by a matching generated column. For 'BETWEEN' and 'IN()', only the first argument can be replaced by a matching generated column, and the other arguments must have the same result type. 'BETWEEN' and 'IN()' are not yet supported for comparisons involving JSON values.
The generated column must be defined as an expression that contains at least a function call or one of the operators mentioned in the preceding item. The expression cannot consist of a simple reference to another column. For example, 'gc INT AS (f1) STORED' consists only of a column reference, so indexes on 'gc' are not considered.
For comparisons of strings to indexed generated columns that compute a value from a JSON function that returns a quoted string, 'JSON_UNQUOTE()' is needed in the column definition to remove the extra quotes from the function value. (For direct comparison of a string to the function result, the JSON comparator handles quote removal, but this does not occur for index lookups.) For example, instead of writing a column definition like this:
doc_name TEXT AS (JSON_EXTRACT(jdoc, '$.name')) STORED
Write it like this:
doc_name TEXT AS (JSON_UNQUOTE(JSON_EXTRACT(jdoc, '$.name'))) STORED
With the latter definition, the optimizer can detect a match for both of these comparisons:
... WHERE JSON_EXTRACT(jdoc, '$.name') = 'SOME_STRING' ...
... WHERE JSON_UNQUOTE(JSON_EXTRACT(jdoc, '$.name')) = 'SOME_STRING' ...
Without 'JSON_UNQUOTE()' in the column definition, the optimizer detects a match only for the first of those comparisons.
If the optimizer fails to choose the desired index, an index hint can be used to force the optimizer to make a different choice.
File: manual.info.tmp, Node: timestamp-lookups, Prev: generated-column-index-optimizations, Up: optimization-indexes
8.3.11 Indexed Lookups from TIMESTAMP Columns
Temporal values are stored in note 'TIMESTAMP': datetime. columns as UTC values, and values inserted into and retrieved from note 'TIMESTAMP': datetime. columns are converted between the session time zone and UTC. (This is the same type of conversion performed by the 'CONVERT_TZ()' function. If the session time zone is UTC, there is effectively no time zone conversion.)
Due to conventions for local time zone changes such as Daylight Saving Time (DST), conversions between UTC and non-UTC time zones are not one-to-one in both directions. UTC values that are distinct may not be distinct in another time zone. The following example shows distinct UTC values that become identical in a non-UTC time zone:
mysql> CREATE TABLE tstable (ts TIMESTAMP);
mysql> SET time_zone = 'UTC'; -- insert UTC values
mysql> INSERT INTO tstable VALUES
('2018-10-28 00:30:00'),
('2018-10-28 01:30:00');
mysql> SELECT ts FROM tstable;
+---------------------+
| ts |
+---------------------+
| 2018-10-28 00:30:00 |
| 2018-10-28 01:30:00 |
+---------------------+
mysql> SET time_zone = 'MET'; -- retrieve non-UTC values
mysql> SELECT ts FROM tstable;
+---------------------+
| ts |
+---------------------+
| 2018-10-28 02:30:00 |
| 2018-10-28 02:30:00 |
+---------------------+
Note:
To use named time zones such as ''MET'' or ''Europe/Amsterdam'', the time zone tables must be properly set up. For instructions, see *note time-zone-support::.
You can see that the two distinct UTC values are the same when converted to the ''MET'' time zone. This phenomenon can lead to different results for a given *note 'TIMESTAMP': datetime. column query, depending on whether the optimizer uses an index to execute the query.
Suppose that a query selects values from the table shown earlier using a 'WHERE' clause to search the 'ts' column for a single specific value such as a user-provided timestamp literal:
SELECT ts FROM tstable
WHERE ts = 'LITERAL';
Suppose further that the query executes under these conditions:
The session time zone is not UTC and has a DST shift. For example:
SET time_zone = 'MET';
Unique UTC values stored in the *note 'TIMESTAMP': datetime. column are not unique in the session time zone due to DST shifts. (The example shown earlier illustrates how this can occur.)
The query specifies a search value that is within the hour of entry into DST in the session time zone.
Under those conditions, the comparison in the 'WHERE' clause occurs in different ways for nonindexed and indexed lookups and leads to different results:
If there is no index or the optimizer cannot use it, comparisons occur in the session time zone. The optimizer performs a table scan in which it retrieves each 'ts' column value, converts it from UTC to the session time zone, and compares it to the search value (also interpreted in the session time zone):
mysql> SELECT ts FROM tstable
WHERE ts = '2018-10-28 02:30:00';
+---------------------+
| ts |
+---------------------+
| 2018-10-28 02:30:00 |
| 2018-10-28 02:30:00 |
+---------------------+
Because the stored 'ts' values are converted to the session time zone, it is possible for the query to return two timestamp values that are distinct as UTC values but equal in the session time zone: One value that occurs before the DST shift when clocks are changed, and one value that was occurs after the DST shift.
If there is a usable index, comparisons occur in UTC. The optimizer performs an index scan, first converting the search value from the session time zone to UTC, then comparing the result to the UTC index entries:
mysql> ALTER TABLE tstable ADD INDEX (ts);
mysql> SELECT ts FROM tstable
WHERE ts = '2018-10-28 02:30:00';
+---------------------+
| ts |
+---------------------+
| 2018-10-28 02:30:00 |
+---------------------+
In this case, the (converted) search value is matched only to index entries, and because the index entries for the distinct stored UTC values are also distinct, the search value can match only one of them.
Due to different optimizer operation for nonindexed and indexed lookups, the query produces different results in each case. The result from the nonindexed lookup returns all values that match in the session time zone. The indexed lookup cannot do so:
It is performed within the storage engine, which knows only about UTC values.
For the two distinct session time zone values that map to the same UTC value, the indexed lookup matches only the corresponding UTC index entry and returns only a single row.
In the preceding discussion, the data set stored in 'tstable' happens to consist of distinct UTC values. In such cases, all index-using queries of the form shown match at most one index entry.
If the index is not 'UNIQUE', it is possible for the table (and the index) to store multiple instances of a given UTC value. For example, the 'ts' column might contain multiple instances of the UTC value ''2018-10-28 00:30:00''. In this case, the index-using query would return each of them (converted to the MET value ''2018-10-28 02:30:00'' in the result set). It remains true that index-using queries match the converted search value to a single value in the UTC index entries, rather than matching multiple UTC values that convert to the search value in the session time zone.
If it is important to return all 'ts' values that match in the session time zone, the workaround is to suppress use of the index with an 'IGNORE INDEX' hint:
mysql> SELECT ts FROM tstable
IGNORE INDEX (ts)
WHERE ts = '2018-10-28 02:30:00';
+---------------------+
| ts |
+---------------------+
| 2018-10-28 02:30:00 |
| 2018-10-28 02:30:00 |
+---------------------+
The same lack of one-to-one mapping for time zone conversions in both directions occurs in other contexts as well, such as conversions performed with the 'FROM_UNIXTIME()' and 'UNIX_TIMESTAMP()' functions. See *note date-and-time-functions::.
File: manual.info.tmp, Node: optimizing-database-structure, Next: optimizing-innodb, Prev: optimization-indexes, Up: optimization