14.6 InnoDB On-Disk Structures

Menu:
innodb-tables:: Tables
innodb-indexes:: Indexes
innodb-tablespace:: Tablespaces
innodb-data-dictionary:: InnoDB Data Dictionary
innodb-doublewrite-buffer:: Doublewrite Buffer
innodb-redo-log:: Redo Log
innodb-undo-logs:: Undo Logs

This section describes 'InnoDB' on-disk structures and related topics.

File: manual.info.tmp, Node: innodb-tables, Next: innodb-indexes, Prev: innodb-on-disk-structures, Up: innodb-on-disk-structures

14.6.1 Tables

Menu:
using-innodb-tables:: Creating InnoDB Tables
innodb-create-table-external:: Creating Tables Externally
innodb-table-import:: Importing InnoDB Tables
innodb-migration:: Moving or Copying InnoDB Tables
converting-tables-to-innodb:: Converting Tables from MyISAM to InnoDB
innodb-auto-increment-handling:: AUTO_INCREMENT Handling in InnoDB

This section covers topics related to 'InnoDB' tables.

File: manual.info.tmp, Node: using-innodb-tables, Next: innodb-create-table-external, Prev: innodb-tables, Up: innodb-tables

14.6.1.1 Creating InnoDB Tables ...............................

'InnoDB' tables are created using the *note 'CREATE TABLE': create-table. statement; for example:

 CREATE TABLE t1 (a INT, b CHAR (20), PRIMARY KEY (a)) ENGINE=InnoDB;

The 'ENGINE=InnoDB' clause is not required when 'InnoDB' is defined as the default storage engine, which it is by default. However, the 'ENGINE' clause is useful if the *note 'CREATE TABLE': create-table. statement is to be replayed on a different MySQL Server instance where the default storage engine is not 'InnoDB' or is unknown. You can determine the default storage engine on a MySQL Server instance by issuing the following statement:

 mysql> SELECT @@default_storage_engine;
 +--------------------------+
 | @@default_storage_engine |
 +--------------------------+
 | InnoDB                   |
 +--------------------------+

'InnoDB' tables are created in file-per-table tablespaces by default. To create an 'InnoDB' table in the 'InnoDB' system tablespace, disable the 'innodb_file_per_table' variable before creating the table. To create an 'InnoDB' table in a general tablespace, use note 'CREATE TABLE ... TABLESPACE': create-table. syntax. For more information, see note innodb-tablespace::.

.frm Files

MySQL stores data dictionary information for tables in .frm files in database directories. Unlike other MySQL storage engines, 'InnoDB' also encodes information about the table in its own internal data dictionary inside the system tablespace. When MySQL drops a table or a database, it deletes one or more '.frm' files as well as the corresponding entries inside the 'InnoDB' data dictionary. You cannot move 'InnoDB' tables between databases simply by moving the '.frm' files. For information about moving 'InnoDB' tables, see *note innodb-migration::.

Row Formats

The row format of an 'InnoDB' table determines how its rows are physically stored on disk. 'InnoDB' supports four row formats, each with different storage characteristics. Supported row formats include 'REDUNDANT', 'COMPACT', 'DYNAMIC', and 'COMPRESSED'. The 'DYNAMIC' row format is the default. For information about row format characteristics, see *note innodb-row-format::.

The 'innodb_default_row_format' variable defines the default row format. The row format of a table can also be defined explicitly using the 'ROW_FORMAT' table option in a 'CREATE TABLE' or 'ALTER TABLE' statement. See *note innodb-row-format-defining::.

Primary Keys

It is recommended that you define a primary key for each table that you create. When selecting primary key columns, choose columns with the following characteristics:

Columns that are referenced by the most important queries.
Columns that are never left blank.
Columns that never have duplicate values.
Columns that rarely if ever change value once inserted.

For example, in a table containing information about people, you would not create a primary key on '(firstname, lastname)' because more than one person can have the same name, a name column may be left blank, and sometimes people change their names. With so many constraints, often there is not an obvious set of columns to use as a primary key, so you create a new column with a numeric ID to serve as all or part of the primary key. You can declare an auto-increment column so that ascending values are filled in automatically as rows are inserted:

 # The value of ID can act like a pointer between related items in different tables.
 CREATE TABLE t5 (id INT AUTO_INCREMENT, b CHAR (20), PRIMARY KEY (id));

 # The primary key can consist of more than one column. Any autoinc column must come first.
 CREATE TABLE t6 (id INT AUTO_INCREMENT, a INT, b CHAR (20), PRIMARY KEY (id,a));

For more information about auto-increment columns, see *note innodb-auto-increment-handling::.

Although a table works correctly without defining a primary key, the primary key is involved with many aspects of performance and is a crucial design aspect for any large or frequently used table. It is recommended that you always specify a primary key in the note 'CREATE TABLE': create-table. statement. If you create the table, load data, and then run note 'ALTER TABLE': alter-table. to add a primary key later, that operation is much slower than defining the primary key when creating the table. For more information about primary keys, see *note innodb-index-types::.

Viewing InnoDB Table Properties

To view the properties of an 'InnoDB' table, issue a *note 'SHOW TABLE STATUS': show-table-status. statement:

 mysql> SHOW TABLE STATUS FROM test LIKE 't%' \G;
 *************************** 1. row ***************************
            Name: t1
          Engine: InnoDB
         Version: 10
      Row_format: Dynamic
            Rows: 0
  Avg_row_length: 0
     Data_length: 16384
 Max_data_length: 0
    Index_length: 0
       Data_free: 0
  Auto_increment: NULL
     Create_time: 2021-02-18 12:18:28
     Update_time: NULL
      Check_time: NULL
       Collation: utf8mb4_0900_ai_ci
        Checksum: NULL
  Create_options:
         Comment:

For information about note 'SHOW TABLE STATUS': show-table-status. output, see note show-table-status::.

You can also access 'InnoDB' table properties by querying the 'InnoDB' Information Schema system tables:

 mysql> SELECT * FROM INFORMATION_SCHEMA.INNODB_SYS_TABLES WHERE NAME='test/t1' \G
 *************************** 1. row ***************************
      TABLE_ID: 45
          NAME: test/t1
          FLAG: 1
        N_COLS: 5
         SPACE: 35
   FILE_FORMAT: Barracuda
    ROW_FORMAT: Dynamic
 ZIP_PAGE_SIZE: 0
    SPACE_TYPE: Single

For more information, see *note innodb-information-schema-system-tables::.

File: manual.info.tmp, Node: innodb-create-table-external, Next: innodb-table-import, Prev: using-innodb-tables, Up: innodb-tables

14.6.1.2 Creating Tables Externally ...................................

There are different reasons for creating 'InnoDB' tables externally; that is, creating tables outside of the data directory. Those reasons might include space management, I/O optimization, or placing tables on a storage device with particular performance or capacity characteristics, for example.

'InnoDB' supports the following methods for creating tables externally:

*note innodb-create-table-external-data-directory::
*note innodb-create-table-external-tablespace-syntax::
*note innodb-create-table-external-tablespace::

Using the DATA DIRECTORY Clause

You can create an 'InnoDB' table in an external directory by specifying a 'DATA DIRECTORY' clause in the 'CREATE TABLE' statement.

 CREATE TABLE t1 (c1 INT PRIMARY KEY) DATA DIRECTORY = '/EXTERNAL/DIRECTORY';

The 'DATA DIRECTORY' clause is supported for tables created in file-per-table tablespaces. Tables are implicitly created in file-per-table tablespaces when the 'innodb_file_per_table' variable is enabled, which it is by default.

 mysql> SELECT @@innodb_file_per_table;
 +-------------------------+
 | @@innodb_file_per_table |
 +-------------------------+
 |                       1 |
 +-------------------------+

For more information about file-per-table tablespaces, see *note innodb-file-per-table-tablespaces::.

Be sure of the directory location you choose, as the 'DATA DIRECTORY' clause cannot be used with *note 'ALTER TABLE': alter-table. to change the location later.

When you specify a 'DATA DIRECTORY' clause in a 'CREATE TABLE' statement, the table's data file ('TABLE_NAME.ibd') is created in a schema directory under the specified directory, and an '.isl' file ('TABLE_NAME.isl') that contains the data file path is created in the schema directory under the MySQL data directory. An '.isl' file is similar in function to a symbolic link. (Actual *note symbolic links: symbolic-links. are not supported for use with 'InnoDB' data files.)

The following example demonstrates creating a table in an external directory using the 'DATA DIRECTORY' clause. It is assumed that the 'innodb_file_per_table' variable is enabled.

 mysql> USE test;
 Database changed

 mysql> CREATE TABLE t1 (c1 INT PRIMARY KEY) DATA DIRECTORY = '/EXTERNAL/DIRECTORY';

 # MySQL creates the table's data file in a schema directory
 # under the external directory

 $> cd /external/directory/test
 $> ls
 t1.ibd

 # An .isl file that contains the data file path is created
 # in the schema directory under the MySQL data directory

 $> cd /PATH/TO/MYSQL/data/test
 $> ls
 db.opt  t1.frm  t1.isl

Usage Notes:

MySQL initially holds the tablespace data file open, preventing you from dismounting the device, but might eventually close the file if the server is busy. Be careful not to accidentally dismount an external device while MySQL is running, or start MySQL while the device is disconnected. Attempting to access a table when the associated data file is missing causes a serious error that requires a server restart.

A server restart might fail if the data file is not found at the expected path. In this case, manually remove the '.isl' file from the schema directory. After restarting, drop the table to remove the '.frm' file and the information about the table from the data dictionary.
Before placing a table on an NFS-mounted volume, review potential issues outlined in *note disk-issues-nfs::.
If using an LVM snapshot, file copy, or other file-based mechanism to back up the table's data file, always use the 'FLUSH TABLES ... FOR EXPORT' statement first to ensure that all changes buffered in memory are flushed to disk before the backup occurs.
Using the 'DATA DIRECTORY' clause to create a table in an external directory is an alternative to using *note symbolic links: symbolic-links, which 'InnoDB' does not support.
The 'DATA DIRECTORY' clause is not supported in a replication environment where the source and replica reside on the same host. The 'DATA DIRECTORY' clause requires a full directory path. Replicating the path in this case would cause the source and replica to create the table in same location.

Using CREATE TABLE ... TABLESPACE Syntax

*note 'CREATE TABLE ... TABLESPACE': create-table. syntax can be used in combination with the 'DATA DIRECTORY' clause to create a table in an external directory. To do so, specify 'innodb_file_per_table' as the tablespace name.

 mysql> CREATE TABLE t2 (c1 INT PRIMARY KEY) TABLESPACE = innodb_file_per_table
        DATA DIRECTORY = '/external/directory';

This method is supported only for tables created in file-per-table tablespaces, but does not require the 'innodb_file_per_table' variable to be enabled. In all other respects, this method is equivalent to the 'CREATE TABLE ... DATA DIRECTORY' method described above. The same usage notes apply.

Creating a Table in an External General Tablespace

You can create a table in a general tablespace that resides in an external directory.

For information about creating a general tablespace in an external directory, see *note general-tablespaces-creating::.
For information about creating a table in a general tablespace, see *note general-tablespaces-adding-tables::.

File: manual.info.tmp, Node: innodb-table-import, Next: innodb-migration, Prev: innodb-create-table-external, Up: innodb-tables

14.6.1.3 Importing InnoDB Tables ................................

This section describes how to import tables using the Transportable Tablespaces feature, which permits importing tables, partitioned tables, or individual table partitions that reside in file-per-table tablespaces. There are many reasons why you might want to import tables:

To run reports on a non-production MySQL server instance to avoid placing extra load on a production server.
To copy data to a new replica server.
To restore a table from a backed-up tablespace file.
As a faster way of moving data than importing a dump file, which requires reinserting data and rebuilding indexes.
To move a data to a server with storage media that is better suited to your storage requirements. For example, you might move busy tables to an SSD device, or move large tables to a high-capacity HDD device.

The Transportable Tablespaces feature is described under the following topics in this section:

*note innodb-table-import-prerequsites::
*note innodb-table-import-example::
*note innodb-table-import-partitioned-table::
*note innodb-table-import-partitions::
*note innodb-table-import-limitations::
*note innodb-table-import-usage-notes::
*note innodb-table-import-internals::

Prerequisites

The 'innodb_file_per_table' variable must be enabled, which it is by default.
The page size of the tablespace must match the page size of the destination MySQL server instance. 'InnoDB' page size is defined by the 'innodb_page_size' variable, which is configured when initializing a MySQL server instance.
If the table has a foreign key relationship, 'foreign_key_checks' must be disabled before executing 'DISCARD TABLESPACE'. Also, you should export all foreign key related tables at the same logical point in time, as *note 'ALTER TABLE ... IMPORT TABLESPACE': alter-table. does not enforce foreign key constraints on imported data. To do so, stop updating the related tables, commit all transactions, acquire shared locks on the tables, and perform the export operations.
When importing a table from another MySQL server instance, both MySQL server instances must have General Availability (GA) status and must be the same version. Otherwise, the table must be created on the same MySQL server instance into which it is being imported.
If the table was created in an external directory by specifying the 'DATA DIRECTORY' clause in the note 'CREATE TABLE': create-table. statement, the table that you replace on the destination instance must be defined with the same 'DATA DIRECTORY' clause. A schema mismatch error is reported if the clauses do not match. To determine if the source table was defined with a 'DATA DIRECTORY' clause, use note 'SHOW CREATE TABLE': show-create-table. to view the table definition. For information about using the 'DATA DIRECTORY' clause, see *note innodb-create-table-external::.
If a 'ROW_FORMAT' option is not defined explicitly in the table definition or 'ROW_FORMAT=DEFAULT' is used, the 'innodb_default_row_format' setting must be the same on the source and destination instances. Otherwise, a schema mismatch error is reported when you attempt the import operation. Use note 'SHOW CREATE TABLE': show-create-table. to check the table definition. Use note 'SHOW VARIABLES': show-variables. to check the 'innodb_default_row_format' setting. For related information, see *note innodb-row-format-defining::.

Importing Tables

This example demonstrates how to import a regular non-partitioned table that resides in a file-per-table tablespace.

On the destination instance, create a table with the same definition as the table you intend to import. (You can obtain the table definition using *note 'SHOW CREATE TABLE': show-create-table. syntax.) If the table definition does not match, a schema mismatch error is reported when you attempt the import operation.
```
  mysql> USE test;
  mysql> CREATE TABLE t1 (c1 INT) ENGINE=INNODB;
```
On the destination instance, discard the tablespace of the table that you just created. (Before importing, you must discard the tablespace of the receiving table.)
```
  mysql> ALTER TABLE t1 DISCARD TABLESPACE;
```
On the source instance, run 'FLUSH TABLES ... FOR EXPORT' to quiesce the table you intend to import. When a table is quiesced, only read-only transactions are permitted on the table.
```
  mysql> USE test;
  mysql> FLUSH TABLES t1 FOR EXPORT;
```
'FLUSH TABLES ... FOR EXPORT' ensures that changes to the named table are flushed to disk so that a binary table copy can be made while the server is running. When 'FLUSH TABLES ... FOR EXPORT' is run, 'InnoDB' generates a '.cfg' metadata file in the schema directory of the table. The '.cfg' file contains metadata that is used for schema verification during the import operation.

Note:

The connection executing 'FLUSH TABLES ... FOR EXPORT' must remain open while the operation is running; otherwise, the '.cfg' file is removed as locks are released upon connection closure.
Copy the '.ibd' file and '.cfg' metadata file from the source instance to the destination instance. For example:
```
  $> scp /PATH/TO/DATADIR/test/t1.{ibd,cfg} destination-server:/PATH/TO/DATADIR/test
```
The '.ibd' file and '.cfg' file must be copied before releasing the shared locks, as described in the next step.

Note:

If you are importing a table from an encrypted tablespace, 'InnoDB' generates a '.cfp' file in addition to a '.cfg' metadata file. The '.cfp' file must be copied to the destination instance together with the '.cfg' file. The '.cfp' file contains a transfer key and an encrypted tablespace key. On import, 'InnoDB' uses the transfer key to decrypt the tablespace key. For related information, see *note innodb-data-encryption::.
On the source instance, use *note 'UNLOCK TABLES': lock-tables. to release the locks acquired by the 'FLUSH TABLES ... FOR EXPORT' statement:
```
  mysql> USE test;
  mysql> UNLOCK TABLES;
```
The *note 'UNLOCK TABLES': lock-tables. operation also removes the '.cfg' file.

On the destination instance, import the tablespace:

  mysql> USE test;
  mysql> ALTER TABLE t1 IMPORT TABLESPACE;

Importing Partitioned Tables

This example demonstrates how to import a partitioned table, where each table partition resides in a file-per-table tablespace.

On the destination instance, create a partitioned table with the same definition as the partitioned table that you want to import. (You can obtain the table definition using *note 'SHOW CREATE TABLE': show-create-table. syntax.) If the table definition does not match, a schema mismatch error is reported when you attempt the import operation.
```
  mysql> USE test;
  mysql> CREATE TABLE t1 (i int) ENGINE = InnoDB PARTITION BY KEY (i) PARTITIONS 3;
```
In the '/DATADIR/test' directory, there is a tablespace '.ibd' file for each of the three partitions.
```
  mysql> \! ls /PATH/TO/DATADIR/test/
  db.opt  t1.frm  t1#P#p0.ibd  t1#P#p1.ibd  t1#P#p2.ibd
```
On the destination instance, discard the tablespace for the partitioned table. (Before the import operation, you must discard the tablespace of the receiving table.)
```
  mysql> ALTER TABLE t1 DISCARD TABLESPACE;
```
The three tablespace '.ibd' files of the partitioned table are discarded from the '/DATADIR/test' directory, leaving the following files:
```
  mysql> \! ls /PATH/TO/DATADIR/test/
  db.opt  t1.frm
```
On the source instance, run 'FLUSH TABLES ... FOR EXPORT' to quiesce the partitioned table that you intend to import. When a table is quiesced, only read-only transactions are permitted on the table.
```
  mysql> USE test;
  mysql> FLUSH TABLES t1 FOR EXPORT;
```
'FLUSH TABLES ... FOR EXPORT' ensures that changes to the named table are flushed to disk so that binary table copy can be made while the server is running. When 'FLUSH TABLES ... FOR EXPORT' is run, 'InnoDB' generates '.cfg' metadata files in the schema directory of the table for each of the table's tablespace files.
```
  mysql> \! ls /PATH/TO/DATADIR/test/
  db.opt t1#P#p0.ibd  t1#P#p1.ibd  t1#P#p2.ibd
  t1.frm  t1#P#p0.cfg  t1#P#p1.cfg  t1#P#p2.cfg
```
The '.cfg' files contain metadata that is used for schema verification when importing the tablespace. 'FLUSH TABLES ... FOR EXPORT' can only be run on the table, not on individual table partitions.
Copy the '.ibd' and '.cfg' files from the source instance schema directory to the destination instance schema directory. For example:
```
  $>scp /PATH/TO/DATADIR/test/t1*.{ibd,cfg} destination-server:/PATH/TO/DATADIR/test
```
The '.ibd' and '.cfg' files must be copied before releasing the shared locks, as described in the next step.

Note:

If you are importing a table from an encrypted tablespace, 'InnoDB' generates a '.cfp' files in addition to a '.cfg' metadata files. The '.cfp' files must be copied to the destination instance together with the '.cfg' files. The '.cfp' files contain a transfer key and an encrypted tablespace key. On import, 'InnoDB' uses the transfer key to decrypt the tablespace key. For related information, see *note innodb-data-encryption::.
On the source instance, use *note 'UNLOCK TABLES': lock-tables. to release the locks acquired by 'FLUSH TABLES ... FOR EXPORT':
```
  mysql> USE test;
  mysql> UNLOCK TABLES;
```
On the destination instance, import the tablespace of the partitioned table:
```
  mysql> USE test;
  mysql> ALTER TABLE t1 IMPORT TABLESPACE;
```

Importing Table Partitions

This example demonstrates how to import individual table partitions, where each partition resides in a file-per-table tablespace file.

In the following example, two partitions ('p2' and 'p3') of a four-partition table are imported.

On the destination instance, create a partitioned table with the same definition as the partitioned table that you want to import partitions from. (You can obtain the table definition using *note 'SHOW CREATE TABLE': show-create-table. syntax.) If the table definition does not match, a schema mismatch error is reported when you attempt the import operation.
```
  mysql> USE test;
  mysql> CREATE TABLE t1 (i int) ENGINE = InnoDB PARTITION BY KEY (i) PARTITIONS 4;
```
In the '/DATADIR/test' directory, there is a tablespace '.ibd' file for each of the four partitions.
```
  mysql> \! ls /PATH/TO/DATADIR/test/
  db.opt  t1.frm  t1#P#p0.ibd  t1#P#p1.ibd  t1#P#p2.ibd t1#P#p3.ibd
```
On the destination instance, discard the partitions that you intend to import from the source instance. (Before importing partitions, you must discard the corresponding partitions from the receiving partitioned table.)
```
  mysql> ALTER TABLE t1 DISCARD PARTITION p2, p3 TABLESPACE;
```
The tablespace '.ibd' files for the two discarded partitions are removed from the '/DATADIR/test' directory on the destination instance, leaving the following files:
```
  mysql> \! ls /PATH/TO/DATADIR/test/
  db.opt  t1.frm  t1#P#p0.ibd  t1#P#p1.ibd
```
Note:

When *note 'ALTER TABLE ... DISCARD PARTITION ... TABLESPACE': alter-table. is run on subpartitioned tables, both partition and subpartition table names are permitted. When a partition name is specified, subpartitions of that partition are included in the operation.
On the source instance, run 'FLUSH TABLES ... FOR EXPORT' to quiesce the partitioned table. When a table is quiesced, only read-only transactions are permitted on the table.
```
  mysql> USE test;
  mysql> FLUSH TABLES t1 FOR EXPORT;
```
'FLUSH TABLES ... FOR EXPORT' ensures that changes to the named table are flushed to disk so that binary table copy can be made while the instance is running. When 'FLUSH TABLES ... FOR EXPORT' is run, 'InnoDB' generates a '.cfg' metadata file for each of the table's tablespace files in the schema directory of the table.
```
  mysql> \! ls /PATH/TO/DATADIR/test/
  db.opt  t1#P#p0.ibd  t1#P#p1.ibd  t1#P#p2.ibd t1#P#p3.ibd
  t1.frm  t1#P#p0.cfg  t1#P#p1.cfg  t1#P#p2.cfg t1#P#p3.cfg
```
The '.cfg' files contain metadata that used for schema verification during the import operation. 'FLUSH TABLES ... FOR EXPORT' can only be run on the table, not on individual table partitions.
Copy the '.ibd' and '.cfg' files for partition 'p2' and partition 'p3' from the source instance schema directory to the destination instance schema directory.
```
  $> scp t1#P#p2.ibd t1#P#p2.cfg t1#P#p3.ibd t1#P#p3.cfg destination-server:/PATH/TO/DATADIR/test
```
The '.ibd' and '.cfg' files must be copied before releasing the shared locks, as described in the next step.

Note:

If you are importing partitions from an encrypted tablespace, 'InnoDB' generates a '.cfp' files in addition to a '.cfg' metadata files. The '.cfp' files must be copied to the destination instance together with the '.cfg' files. The '.cfp' files contain a transfer key and an encrypted tablespace key. On import, 'InnoDB' uses the transfer key to decrypt the tablespace key. For related information, see *note innodb-data-encryption::.
On the source instance, use *note 'UNLOCK TABLES': lock-tables. to release the locks acquired by 'FLUSH TABLES ... FOR EXPORT':
```
  mysql> USE test;
  mysql> UNLOCK TABLES;
```
On the destination instance, import table partitions 'p2' and 'p3':
```
  mysql> USE test;
  mysql> ALTER TABLE t1 IMPORT PARTITION p2, p3 TABLESPACE;
```
Note:

When *note 'ALTER TABLE ... IMPORT PARTITION ... TABLESPACE': alter-table. is run on subpartitioned tables, both partition and subpartition table names are permitted. When a partition name is specified, subpartitions of that partition are included in the operation.

Limitations

The Transportable Tablespaces feature is only supported for tables that reside in file-per-table tablespaces. It is not supported for the tables that reside in the system tablespace or general tablespaces. Tables in shared tablespaces cannot be quiesced.
'FLUSH TABLES ... FOR EXPORT' is not supported on tables with a 'FULLTEXT' index, as full-text search auxiliary tables cannot be flushed. After importing a table with a 'FULLTEXT' index, run *note 'OPTIMIZE TABLE': optimize-table. to rebuild the 'FULLTEXT' indexes. Alternatively, drop 'FULLTEXT' indexes before the export operation and recreate the indexes after importing the table on the destination instance.
Due to a '.cfg' metadata file limitation, schema mismatches are not reported for partition type or partition definition differences when importing a partitioned table. Column differences are reported.

Usage Notes

*note 'ALTER TABLE ... IMPORT TABLESPACE': alter-table. does not require a '.cfg' metadata file to import a table. However, metadata checks are not performed when importing without a '.cfg' file, and a warning similar to the following is issued:
```
  Message: InnoDB: IO Read error: (2, No such file or directory) Error opening '.\
  test\t.cfg', will attempt to import without schema verification
  1 row in set (0.00 sec)
```
Importing a table without a '.cfg' metadata file should only be considered if no schema mismatches are expected. The ability to import without a '.cfg' file could be useful in crash recovery scenarios where metadata is not accessible.
On Windows, 'InnoDB' stores database, tablespace, and table names internally in lowercase. To avoid import problems on case-sensitive operating systems such as Linux and Unix, create all databases, tablespaces, and tables using lowercase names. A convenient way to accomplish this is to add 'lower_case_table_names=1' to the '[mysqld]' section of your 'my.cnf' or 'my.ini' file before creating databases, tablespaces, or tables:
```
  [mysqld]
  lower_case_table_names=1
```
When running note 'ALTER TABLE ... DISCARD PARTITION ... TABLESPACE': alter-table. and note 'ALTER TABLE ... IMPORT PARTITION ... TABLESPACE': alter-table. on subpartitioned tables, both partition and subpartition table names are permitted. When a partition name is specified, subpartitions of that partition are included in the operation.

Internals

The following information describes internals and messages written to the error log during a table import procedure.

When *note 'ALTER TABLE ... DISCARD TABLESPACE': alter-table. is run on the destination instance:

The table is locked in X mode.
The tablespace is detached from the table.

When 'FLUSH TABLES ... FOR EXPORT' is run on the source instance:

The table being flushed for export is locked in shared mode.
The purge coordinator thread is stopped.
Dirty pages are synchronized to disk.
Table metadata is written to the binary '.cfg' file.

Expected error log messages for this operation:

 [Note] InnoDB: Sync to disk of '"test"."t1"' started.
 [Note] InnoDB: Stopping purge
 [Note] InnoDB: Writing table metadata to './test/t1.cfg'
 [Note] InnoDB: Table '"test"."t1"' flushed to disk

When *note 'UNLOCK TABLES': lock-tables. is run on the source instance:

The binary '.cfg' file is deleted.
The shared lock on the table or tables being imported is released and the purge coordinator thread is restarted.

Expected error log messages for this operation:

 [Note] InnoDB: Deleting the meta-data file './test/t1.cfg'
 [Note] InnoDB: Resuming purge

When *note 'ALTER TABLE ... IMPORT TABLESPACE': alter-table. is run on the destination instance, the import algorithm performs the following operations for each tablespace being imported:

Each tablespace page is checked for corruption.
The space ID and log sequence numbers (LSNs) on each page are updated.
Flags are validated and LSN updated for the header page.
Btree pages are updated.
The page state is set to dirty so that it is written to disk.

Expected error log messages for this operation:

 [Note] InnoDB: Importing tablespace for table 'test/t1' that was exported
 from host 'HOST_NAME'
 [Note] InnoDB: Phase I - Update all pages
 [Note] InnoDB: Sync to disk
 [Note] InnoDB: Sync to disk - done!
 [Note] InnoDB: Phase III - Flush changes to disk
 [Note] InnoDB: Phase IV - Flush complete

Note:

You may also receive a warning that a tablespace is discarded (if you discarded the tablespace for the destination table) and a message stating that statistics could not be calculated due to a missing '.ibd' file:

 [Warning] InnoDB: Table "test"."t1" tablespace is set as discarded.
 7f34d9a37700 InnoDB: cannot calculate statistics for table
 "test"."t1" because the .ibd file is missing. For help, please refer to
 http://dev.mysql.com/doc/refman/5.7/en/innodb-troubleshooting.html

File: manual.info.tmp, Node: innodb-migration, Next: converting-tables-to-innodb, Prev: innodb-table-import, Up: innodb-tables

14.6.1.4 Moving or Copying InnoDB Tables ........................................

This section describes techniques for moving or copying some or all 'InnoDB' tables to a different server or instance. For example, you might move an entire MySQL instance to a larger, faster server; you might clone an entire MySQL instance to a new replica server; you might copy individual tables to another instance to develop and test an application, or to a data warehouse server to produce reports.

On Windows, 'InnoDB' always stores database and table names internally in lowercase. To move databases in a binary format from Unix to Windows or from Windows to Unix, create all databases and tables using lowercase names. A convenient way to accomplish this is to add the following line to the '[mysqld]' section of your 'my.cnf' or 'my.ini' file before creating any databases or tables:

 [mysqld]
 lower_case_table_names=1

Techniques for moving or copying 'InnoDB' tables include:

*note copy-tables-import::
*note copy-tables-meb::
*note copy-tables-cold-backup::
*note copy-tables-logical-backup::

Importing Tables

A table that resides in a file-per-table tablespace can be imported from another MySQL server instance or from a backup using the Transportable Tablespace feature. See *note innodb-table-import::.

MySQL Enterprise Backup

The MySQL Enterprise Backup product lets you back up a running MySQL database with minimal disruption to operations while producing a consistent snapshot of the database. When MySQL Enterprise Backup is copying tables, reads and writes can continue. In addition, MySQL Enterprise Backup can create compressed backup files, and back up subsets of tables. In conjunction with the MySQL binary log, you can perform point-in-time recovery. MySQL Enterprise Backup is included as part of the MySQL Enterprise subscription.

For more details about MySQL Enterprise Backup, see *note mysql-enterprise-backup::.

Copying Data Files (Cold Backup Method)

You can move an 'InnoDB' database simply by copying all the relevant files listed under "Cold Backups" in *note innodb-backup::.

'InnoDB' data and log files are binary-compatible on all platforms having the same floating-point number format. If the floating-point formats differ but you have not used note 'FLOAT': floating-point-types. or note 'DOUBLE': floating-point-types. data types in your tables, then the procedure is the same: simply copy the relevant files.

When you move or copy file-per-table '.ibd' files, the database directory name must be the same on the source and destination systems. The table definition stored in the 'InnoDB' shared tablespace includes the database name. The transaction IDs and log sequence numbers stored in the tablespace files also differ between databases.

To move an '.ibd' file and the associated table from one database to another, use a *note 'RENAME TABLE': rename-table. statement:

 RENAME TABLE DB1.TBL_NAME TO DB2.TBL_NAME;

If you have a 'clean' backup of an '.ibd' file, you can restore it to the MySQL installation from which it originated as follows:

The table must not have been dropped or truncated since you copied the '.ibd' file, because doing so changes the table ID stored inside the tablespace.
Issue this *note 'ALTER TABLE': alter-table. statement to delete the current '.ibd' file:
```
  ALTER TABLE TBL_NAME DISCARD TABLESPACE;
```
Copy the backup '.ibd' file to the proper database directory.
Issue this *note 'ALTER TABLE': alter-table. statement to tell 'InnoDB' to use the new '.ibd' file for the table:
```
  ALTER TABLE TBL_NAME IMPORT TABLESPACE;
```
Note:

The *note 'ALTER TABLE ... IMPORT TABLESPACE': alter-table. feature does not enforce foreign key constraints on imported data.

In this context, a 'clean' '.ibd' file backup is one for which the following requirements are satisfied:

There are no uncommitted modifications by transactions in the '.ibd' file.
There are no unmerged insert buffer entries in the '.ibd' file.
Purge has removed all delete-marked index records from the '.ibd' file.
*note 'mysqld': mysqld. has flushed all modified pages of the '.ibd' file from the buffer pool to the file.

You can make a clean backup '.ibd' file using the following method:

Stop all activity from the *note 'mysqld': mysqld. server and commit all transactions.
Wait until *note 'SHOW ENGINE INNODB STATUS': show-engine. shows that there are no active transactions in the database, and the main thread status of 'InnoDB' is 'Waiting for server activity'. Then you can make a copy of the '.ibd' file.

Another method for making a clean copy of an '.ibd' file is to use the MySQL Enterprise Backup product:

Use MySQL Enterprise Backup to back up the 'InnoDB' installation.
Start a second *note 'mysqld': mysqld. server on the backup and let it clean up the '.ibd' files in the backup.

Restoring from a Logical Backup

You can use a utility such as *note 'mysqldump': mysqldump. to perform a logical backup, which produces a set of SQL statements that can be executed to reproduce the original database object definitions and table data for transfer to another SQL server. Using this method, it does not matter whether the formats differ or if your tables contain floating-point data.

To improve the performance of this method, disable 'autocommit' when importing data. Perform a commit only after importing an entire table or segment of a table.

File: manual.info.tmp, Node: converting-tables-to-innodb, Next: innodb-auto-increment-handling, Prev: innodb-migration, Up: innodb-tables

14.6.1.5 Converting Tables from MyISAM to InnoDB ................................................

If you have note 'MyISAM': myisam-storage-engine. tables that you want to convert to note 'InnoDB': innodb-storage-engine. for better reliability and scalability, review the following guidelines and tips before converting.

*note innodb-convert-memory-usage::
*note innodb-convert-transactions::
*note innodb-convert-deadlock::
*note innodb-convert-plan-storage::
*note innodb-convert-convert::
*note innodb-convert-clone::
*note innodb-convert-transfer::
*note innodb-convert-storage-requirements::
*note innodb-convert-primary-key::
*note innodb-convert-application-performance::
*note innodb-convert-understand-files::

Adjusting Memory Usage for MyISAM and InnoDB

As you transition away from 'MyISAM' tables, lower the value of the 'key_buffer_size' configuration option to free memory no longer needed for caching results. Increase the value of the 'innodb_buffer_pool_size' configuration option, which performs a similar role of allocating cache memory for 'InnoDB' tables. The 'InnoDB' buffer pool caches both table data and index data, speeding up lookups for queries and keeping query results in memory for reuse. For guidance regarding buffer pool size configuration, see *note memory-use::.

On a busy server, run benchmarks with the query cache turned off. The 'InnoDB' buffer pool provides similar benefits, so the query cache might be tying up memory unnecessarily. For information about the query cache, see *note query-cache::.

Handling Too-Long Or Too-Short Transactions

Because 'MyISAM' tables do not support transactions, you might not have paid much attention to the 'autocommit' configuration option and the note 'COMMIT': commit. and note 'ROLLBACK': commit. statements. These keywords are important to allow multiple sessions to read and write 'InnoDB' tables concurrently, providing substantial scalability benefits in write-heavy workloads.

While a transaction is open, the system keeps a snapshot of the data as seen at the beginning of the transaction, which can cause substantial overhead if the system inserts, updates, and deletes millions of rows while a stray transaction keeps running. Thus, take care to avoid transactions that run for too long:

If you are using a note 'mysql': mysql. session for interactive experiments, always note 'COMMIT': commit. (to finalize the changes) or *note 'ROLLBACK': commit. (to undo the changes) when finished. Close down interactive sessions rather than leave them open for long periods, to avoid keeping transactions open for long periods by accident.
Make sure that any error handlers in your application also note 'ROLLBACK': commit. incomplete changes or note 'COMMIT': commit. completed changes.
note 'ROLLBACK': commit. is a relatively expensive operation, because note 'INSERT': insert, note 'UPDATE': update, and note 'DELETE': delete. operations are written to 'InnoDB' tables prior to the *note 'COMMIT': commit, with the expectation that most changes are committed successfully and rollbacks are rare. When experimenting with large volumes of data, avoid making changes to large numbers of rows and then rolling back those changes.
When loading large volumes of data with a sequence of note 'INSERT': insert. statements, periodically note 'COMMIT': commit. the results to avoid having transactions that last for hours. In typical load operations for data warehousing, if something goes wrong, you truncate the table (using note 'TRUNCATE TABLE': truncate-table.) and start over from the beginning rather than doing a note 'ROLLBACK': commit.

The preceding tips save memory and disk space that can be wasted during too-long transactions. When transactions are shorter than they should be, the problem is excessive I/O. With each *note 'COMMIT': commit, MySQL makes sure each change is safely recorded to disk, which involves some I/O.

For most operations on 'InnoDB' tables, you should use the setting 'autocommit=0'. From an efficiency perspective, this avoids unnecessary I/O when you issue large numbers of consecutive note 'INSERT': insert, note 'UPDATE': update, or note 'DELETE': delete. statements. From a safety perspective, this allows you to issue a note 'ROLLBACK': commit. statement to recover lost or garbled data if you make a mistake on the *note 'mysql': mysql. command line, or in an exception handler in your application.
'autocommit=1' is suitable for 'InnoDB' tables when running a sequence of queries for generating reports or analyzing statistics. In this situation, there is no I/O penalty related to note 'COMMIT': commit. or note 'ROLLBACK': commit, and 'InnoDB' can *note automatically optimize the read-only workload: innodb-performance-ro-txn.
If you make a series of related changes, finalize all the changes at once with a single note 'COMMIT': commit. at the end. For example, if you insert related pieces of information into several tables, do a single note 'COMMIT': commit. after making all the changes. Or if you run many consecutive note 'INSERT': insert. statements, do a single note 'COMMIT': commit. after all the data is loaded; if you are doing millions of note 'INSERT': insert. statements, perhaps split up the huge transaction by issuing a note 'COMMIT': commit. every ten thousand or hundred thousand records, so the transaction does not grow too large.
Remember that even a note 'SELECT': select. statement opens a transaction, so after running some report or debugging queries in an interactive note 'mysql': mysql. session, either issue a note 'COMMIT': commit. or close the note 'mysql': mysql. session.

For related information, see *note innodb-autocommit-commit-rollback::.

Handling Deadlocks

You might see warning messages referring to 'deadlocks' in the MySQL error log, or the output of *note 'SHOW ENGINE INNODB STATUS': show-engine. A deadlock is not a serious issue for 'InnoDB' tables, and often does not require any corrective action. When two transactions start modifying multiple tables, accessing the tables in a different order, they can reach a state where each transaction is waiting for the other and neither can proceed. When deadlock detection is enabled (the default), MySQL immediately detects this condition and cancels (rolls back) the 'smaller' transaction, allowing the other to proceed. If deadlock detection is disabled using the 'innodb_deadlock_detect' configuration option, 'InnoDB' relies on the 'innodb_lock_wait_timeout' setting to roll back transactions in case of a deadlock.

Either way, your applications need error-handling logic to restart a transaction that is forcibly cancelled due to a deadlock. When you re-issue the same SQL statements as before, the original timing issue no longer applies. Either the other transaction has already finished and yours can proceed, or the other transaction is still in progress and your transaction waits until it finishes.

If deadlock warnings occur constantly, you might review the application code to reorder the SQL operations in a consistent way, or to shorten the transactions. You can test with the 'innodb_print_all_deadlocks' option enabled to see all deadlock warnings in the MySQL error log, rather than only the last warning in the *note 'SHOW ENGINE INNODB STATUS': show-engine. output.

For more information, see *note innodb-deadlocks::.

Storage Layout

To get the best performance from 'InnoDB' tables, you can adjust a number of parameters related to storage layout.

When you convert 'MyISAM' tables that are large, frequently accessed, and hold vital data, investigate and consider the 'innodb_file_per_table', 'innodb_file_format', and 'innodb_page_size' variables, and the note 'ROW_FORMAT' and 'KEY_BLOCK_SIZE' clauses: innodb-row-format. of the note 'CREATE TABLE': create-table. statement.

During your initial experiments, the most important setting is 'innodb_file_per_table'. When this setting is enabled, which is the default as of MySQL 5.6.6, new 'InnoDB' tables are implicitly created in file-per-table tablespaces. In contrast with the 'InnoDB' system tablespace, file-per-table tablespaces allow disk space to be reclaimed by the operating system when a table is truncated or dropped. File-per-table tablespaces also support the Barracuda file format and associated features such as table compression, efficient off-page storage for long variable-length columns, and large index prefixes. For more information, see *note innodb-file-per-table-tablespaces::.

You can also store 'InnoDB' tables in a shared general tablespace. General tablespaces support the Barracuda file format and can contain multiple tables. For more information, see *note general-tablespaces::.

Converting an Existing Table

To convert a non-'InnoDB' table to use 'InnoDB' use *note 'ALTER TABLE': alter-table.:

 ALTER TABLE TABLE_NAME ENGINE=InnoDB;

Warning:

Do not convert MySQL system tables in the 'mysql' database from 'MyISAM' to 'InnoDB' tables. This is an unsupported operation. If you do this, MySQL does not restart until you restore the old system tables from a backup or regenerate them by reinitializing the data directory (see *note data-directory-initialization::).

Cloning the Structure of a Table

You might make an 'InnoDB' table that is a clone of a MyISAM table, rather than using *note 'ALTER TABLE': alter-table. to perform conversion, to test the old and new table side-by-side before switching.

Create an empty 'InnoDB' table with identical column and index definitions. Use 'SHOW CREATE TABLE TABLE_NAME' to see the full *note 'CREATE TABLE': create-table. statement to use. Change the 'ENGINE' clause to 'ENGINE=INNODB'.

Transferring Data

To transfer a large volume of data into an empty 'InnoDB' table created as shown in the previous section, insert the rows with 'INSERT INTO INNODB_TABLE SELECT * FROM MYISAM_TABLE ORDER BY PRIMARY_KEY_COLUMNS'.

You can also create the indexes for the 'InnoDB' table after inserting the data. Historically, creating new secondary indexes was a slow operation for 'InnoDB', but now you can create the indexes after the data is loaded with relatively little overhead from the index creation step.

If you have 'UNIQUE' constraints on secondary keys, you can speed up a table import by turning off the uniqueness checks temporarily during the import operation:

 SET unique_checks=0;
 ... IMPORT OPERATION ...
 SET unique_checks=1;

For big tables, this saves disk I/O because 'InnoDB' can use its change buffer to write secondary index records as a batch. Be certain that the data contains no duplicate keys. 'unique_checks' permits but does not require storage engines to ignore duplicate keys.

For better control over the insertion process, you can insert big tables in pieces:

 INSERT INTO newtable SELECT * FROM oldtable
    WHERE yourkey > SOMETHING AND yourkey <= SOMETHINGELSE;

After all records are inserted, you can rename the tables.

During the conversion of big tables, increase the size of the 'InnoDB' buffer pool to reduce disk I/O. Typically, the recommended buffer pool size is 50 to 75 percent of system memory. You can also increase the size of 'InnoDB' log files.

Storage Requirements

If you intend to make several temporary copies of your data in 'InnoDB' tables during the conversion process, it is recommended that you create the tables in file-per-table tablespaces so that you can reclaim the disk space when you drop the tables. When the 'innodb_file_per_table' configuration option is enabled (the default), newly created 'InnoDB' tables are implicitly created in file-per-table tablespaces.

Whether you convert the 'MyISAM' table directly or create a cloned 'InnoDB' table, make sure that you have sufficient disk space to hold both the old and new tables during the process. 'InnoDB' tables require more disk space than 'MyISAM' tables. If an *note 'ALTER TABLE': alter-table. operation runs out of space, it starts a rollback, and that can take hours if it is disk-bound. For inserts, 'InnoDB' uses the insert buffer to merge secondary index records to indexes in batches. That saves a lot of disk I/O. For rollback, no such mechanism is used, and the rollback can take 30 times longer than the insertion.

In the case of a runaway rollback, if you do not have valuable data in your database, it may be advisable to kill the database process rather than wait for millions of disk I/O operations to complete. For the complete procedure, see *note forcing-innodb-recovery::.

Defining Primary Keys

The 'PRIMARY KEY' clause is a critical factor affecting the performance of MySQL queries and the space usage for tables and indexes. The primary key uniquely identifies a row in a table. Every row in the table should have a primary key value, and no two rows can have the same primary key value.

These are guidelines for the primary key, followed by more detailed explanations.

Declare a 'PRIMARY KEY' for each table. Typically, it is the most important column that you refer to in 'WHERE' clauses when looking up a single row.
Declare the 'PRIMARY KEY' clause in the original note 'CREATE TABLE': create-table. statement, rather than adding it later through an note 'ALTER TABLE': alter-table. statement.
Choose the column and its data type carefully. Prefer numeric columns over character or string ones.
Consider using an auto-increment column if there is not another stable, unique, non-null, numeric column to use.
An auto-increment column is also a good choice if there is any doubt whether the value of the primary key column could ever change. Changing the value of a primary key column is an expensive operation, possibly involving rearranging data within the table and within each secondary index.

Consider adding a primary key to any table that does not already have one. Use the smallest practical numeric type based on the maximum projected size of the table. This can make each row slightly more compact, which can yield substantial space savings for large tables. The space savings are multiplied if the table has any secondary indexes, because the primary key value is repeated in each secondary index entry. In addition to reducing data size on disk, a small primary key also lets more data fit into the buffer pool, speeding up all kinds of operations and improving concurrency.

If the table already has a primary key on some longer column, such as a 'VARCHAR', consider adding a new unsigned 'AUTO_INCREMENT' column and switching the primary key to that, even if that column is not referenced in queries. This design change can produce substantial space savings in the secondary indexes. You can designate the former primary key columns as 'UNIQUE NOT NULL' to enforce the same constraints as the 'PRIMARY KEY' clause, that is, to prevent duplicate or null values across all those columns.

If you spread related information across multiple tables, typically each table uses the same column for its primary key. For example, a personnel database might have several tables, each with a primary key of employee number. A sales database might have some tables with a primary key of customer number, and other tables with a primary key of order number. Because lookups using the primary key are very fast, you can construct efficient join queries for such tables.

If you leave the 'PRIMARY KEY' clause out entirely, MySQL creates an invisible one for you. It is a 6-byte value that might be longer than you need, thus wasting space. Because it is hidden, you cannot refer to it in queries.

Application Performance Considerations

The reliability and scalability features of 'InnoDB' require more disk storage than equivalent 'MyISAM' tables. You might change the column and index definitions slightly, for better space utilization, reduced I/O and memory consumption when processing result sets, and better query optimization plans making efficient use of index lookups.

If you set up a numeric ID column for the primary key, use that value to cross-reference with related values in any other tables, particularly for join queries. For example, rather than accepting a country name as input and doing queries searching for the same name, do one lookup to determine the country ID, then do other queries (or a single join query) to look up relevant information across several tables. Rather than storing a customer or catalog item number as a string of digits, potentially using up several bytes, convert it to a numeric ID for storing and querying. A 4-byte unsigned note 'INT': integer-types. column can index over 4 billion items (with the US meaning of billion: 1000 million). For the ranges of the different integer types, see note integer-types::.

Understanding Files Associated with InnoDB Tables

'InnoDB' files require more care and planning than 'MyISAM' files do.

You must not delete the ibdata files that represent the 'InnoDB' system tablespace.
Methods of moving or copying 'InnoDB' tables to a different server are described in *note innodb-migration::.

File: manual.info.tmp, Node: innodb-auto-increment-handling, Prev: converting-tables-to-innodb, Up: innodb-tables

14.6.1.6 AUTO_INCREMENT Handling in InnoDB ..........................................

'InnoDB' provides a configurable locking mechanism that can significantly improve scalability and performance of SQL statements that add rows to tables with 'AUTO_INCREMENT' columns. To use the 'AUTO_INCREMENT' mechanism with an 'InnoDB' table, an 'AUTO_INCREMENT' column must be defined as the first or only column of some index such that it is possible to perform the equivalent of an indexed 'SELECT MAX(AI_COL)' lookup on the table to obtain the maximum column value. The index is not required to be a 'PRIMARY KEY' or 'UNIQUE', but to avoid duplicate values in the 'AUTO_INCREMENT' column, those index types are recommended.

This section describes the 'AUTO_INCREMENT' lock modes, usage implications of different 'AUTO_INCREMENT' lock mode settings, and how 'InnoDB' initializes the 'AUTO_INCREMENT' counter.

*note innodb-auto-increment-lock-modes::
*note innodb-auto-increment-lock-mode-usage-implications::
*note innodb-auto-increment-initialization::
*note innodb-auto-increment-notes::

InnoDB AUTO_INCREMENT Lock Modes

This section describes the 'AUTO_INCREMENT' lock modes used to generate auto-increment values, and how each lock mode affects replication. The auto-increment lock mode is configured at startup using the 'innodb_autoinc_lock_mode' variable.

The following terms are used in describing 'innodb_autoinc_lock_mode' settings:

'*note 'INSERT': insert.-like' statements

All statements that generate new rows in a table, including note 'INSERT': insert, note 'INSERT ... SELECT': insert-select, note 'REPLACE': replace, note 'REPLACE ... SELECT': replace, and *note 'LOAD DATA': load-data. Includes 'simple-inserts', 'bulk-inserts', and 'mixed-mode' inserts.
'Simple inserts'

Statements for which the number of rows to be inserted can be determined in advance (when the statement is initially processed). This includes single-row and multiple-row note 'INSERT': insert. and note 'REPLACE': replace. statements that do not have a nested subquery, but not *note 'INSERT ... ON DUPLICATE KEY UPDATE': insert-on-duplicate.
'Bulk inserts'

Statements for which the number of rows to be inserted (and the number of required auto-increment values) is not known in advance. This includes note 'INSERT ... SELECT': insert-select, note 'REPLACE ... SELECT': replace, and *note 'LOAD DATA': load-data. statements, but not plain 'INSERT'. 'InnoDB' assigns new values for the 'AUTO_INCREMENT' column one at a time as each row is processed.
'Mixed-mode inserts'

These are 'simple insert' statements that specify the auto-increment value for some (but not all) of the new rows. An example follows, where 'c1' is an 'AUTO_INCREMENT' column of table 't1':
```
  INSERT INTO t1 (c1,c2) VALUES (1,'a'), (NULL,'b'), (5,'c'), (NULL,'d');
```
Another type of 'mixed-mode insert' is note 'INSERT ... ON DUPLICATE KEY UPDATE': insert-on-duplicate, which in the worst case is in effect an note 'INSERT': insert. followed by a *note 'UPDATE': update, where the allocated value for the 'AUTO_INCREMENT' column may or may not be used during the update phase.

There are three possible settings for the 'innodb_autoinc_lock_mode' variable. The settings are 0, 1, or 2, for 'traditional', 'consecutive', or 'interleaved' lock mode, respectively.

'innodb_autoinc_lock_mode = 0' ('traditional' lock mode)

The traditional lock mode provides the same behavior that existed before the 'innodb_autoinc_lock_mode' variable was introduced. The traditional lock mode option is provided for backward compatibility, performance testing, and working around issues with "mixed-mode inserts", due to possible differences in semantics.

In this lock mode, all 'INSERT-like' statements obtain a special table-level 'AUTO-INC' lock for inserts into tables with 'AUTO_INCREMENT' columns. This lock is normally held to the end of the statement (not to the end of the transaction) to ensure that auto-increment values are assigned in a predictable and repeatable order for a given sequence of *note 'INSERT': insert. statements, and to ensure that auto-increment values assigned by any given statement are consecutive.

In the case of statement-based replication, this means that when an SQL statement is replicated on a replica server, the same values are used for the auto-increment column as on the source server. The result of execution of multiple note 'INSERT': insert. statements is deterministic, and the replica reproduces the same data as on the source. If auto-increment values generated by multiple note 'INSERT': insert. statements were interleaved, the result of two concurrent *note 'INSERT': insert. statements would be nondeterministic, and could not reliably be propagated to a replica server using statement-based replication.

To make this clear, consider an example that uses this table:
```
  CREATE TABLE t1 (
    c1 INT(11) NOT NULL AUTO_INCREMENT,
    c2 VARCHAR(10) DEFAULT NULL,
    PRIMARY KEY (c1)
  ) ENGINE=InnoDB;
```
Suppose that there are two transactions running, each inserting rows into a table with an 'AUTO_INCREMENT' column. One transaction is using an note 'INSERT ... SELECT': insert-select. statement that inserts 1000 rows, and another is using a simple note 'INSERT': insert. statement that inserts one row:
```
  Tx1: INSERT INTO t1 (c2) SELECT 1000 rows from another table ...
  Tx2: INSERT INTO t1 (c2) VALUES ('xxx');
```
'InnoDB' cannot tell in advance how many rows are retrieved from the note 'SELECT': select. in the note 'INSERT': insert. statement in Tx1, and it assigns the auto-increment values one at a time as the statement proceeds. With a table-level lock, held to the end of the statement, only one note 'INSERT': insert. statement referring to table 't1' can execute at a time, and the generation of auto-increment numbers by different statements is not interleaved. The auto-increment values generated by the Tx1 note 'INSERT ... SELECT': insert-select. statement are consecutive, and the (single) auto-increment value used by the *note 'INSERT': insert. statement in Tx2 is either smaller or larger than all those used for Tx1, depending on which statement executes first.

As long as the SQL statements execute in the same order when replayed from the binary log (when using statement-based replication, or in recovery scenarios), the results are the same as they were when Tx1 and Tx2 first ran. Thus, table-level locks held until the end of a statement make *note 'INSERT': insert. statements using auto-increment safe for use with statement-based replication. However, those table-level locks limit concurrency and scalability when multiple transactions are executing insert statements at the same time.

In the preceding example, if there were no table-level lock, the value of the auto-increment column used for the note 'INSERT': insert. in Tx2 depends on precisely when the statement executes. If the note 'INSERT': insert. of Tx2 executes while the note 'INSERT': insert. of Tx1 is running (rather than before it starts or after it completes), the specific auto-increment values assigned by the two note 'INSERT': insert. statements are nondeterministic, and may vary from run to run.

Under the consecutive lock mode, 'InnoDB' can avoid using table-level 'AUTO-INC' locks for 'simple insert' statements where the number of rows is known in advance, and still preserve deterministic execution and safety for statement-based replication.

If you are not using the binary log to replay SQL statements as part of recovery or replication, the interleaved lock mode can be used to eliminate all use of table-level 'AUTO-INC' locks for even greater concurrency and performance, at the cost of permitting gaps in auto-increment numbers assigned by a statement and potentially having the numbers assigned by concurrently executing statements interleaved.
'innodb_autoinc_lock_mode = 1' ('consecutive' lock mode)

This is the default lock mode. In this mode, 'bulk inserts' use the special 'AUTO-INC' table-level lock and hold it until the end of the statement. This applies to all note 'INSERT ... SELECT': insert-select, note 'REPLACE ... SELECT': replace, and *note 'LOAD DATA': load-data. statements. Only one statement holding the 'AUTO-INC' lock can execute at a time. If the source table of the bulk insert operation is different from the target table, the 'AUTO-INC' lock on the target table is taken after a shared lock is taken on the first row selected from the source table. If the source and target of the bulk insert operation are the same table, the 'AUTO-INC' lock is taken after shared locks are taken on all selected rows.

'Simple inserts' (for which the number of rows to be inserted is known in advance) avoid table-level 'AUTO-INC' locks by obtaining the required number of auto-increment values under the control of a mutex (a light-weight lock) that is only held for the duration of the allocation process, not until the statement completes. No table-level 'AUTO-INC' lock is used unless an 'AUTO-INC' lock is held by another transaction. If another transaction holds an 'AUTO-INC' lock, a 'simple insert' waits for the 'AUTO-INC' lock, as if it were a 'bulk insert'.

This lock mode ensures that, in the presence of note 'INSERT': insert. statements where the number of rows is not known in advance (and where auto-increment numbers are assigned as the statement progresses), all auto-increment values assigned by any 'note 'INSERT': insert.-like' statement are consecutive, and operations are safe for statement-based replication.

Simply put, this lock mode significantly improves scalability while being safe for use with statement-based replication. Further, as with 'traditional' lock mode, auto-increment numbers assigned by any given statement are consecutive. There is no change in semantics compared to 'traditional' mode for any statement that uses auto-increment, with one important exception.

The exception is for 'mixed-mode inserts', where the user provides explicit values for an 'AUTO_INCREMENT' column for some, but not all, rows in a multiple-row 'simple insert'. For such inserts, 'InnoDB' allocates more auto-increment values than the number of rows to be inserted. However, all values automatically assigned are consecutively generated (and thus higher than) the auto-increment value generated by the most recently executed previous statement. 'Excess' numbers are lost.
'innodb_autoinc_lock_mode = 2' ('interleaved' lock mode)

In this lock mode, no '*note 'INSERT': insert.-like' statements use the table-level 'AUTO-INC' lock, and multiple statements can execute at the same time. This is the fastest and most scalable lock mode, but it is not safe when using statement-based replication or recovery scenarios when SQL statements are replayed from the binary log.

In this lock mode, auto-increment values are guaranteed to be unique and monotonically increasing across all concurrently executing '*note 'INSERT': insert.-like' statements. However, because multiple statements can be generating numbers at the same time (that is, allocation of numbers is interleaved across statements), the values generated for the rows inserted by any given statement may not be consecutive.

If the only statements executing are 'simple inserts' where the number of rows to be inserted is known ahead of time, there are no gaps in the numbers generated for a single statement, except for 'mixed-mode inserts'. However, when 'bulk inserts' are executed, there may be gaps in the auto-increment values assigned by any given statement.

InnoDB AUTO_INCREMENT Lock Mode Usage Implications

Using auto-increment with replication

If you are using statement-based replication, set 'innodb_autoinc_lock_mode' to 0 or 1 and use the same value on the source and its replicas. Auto-increment values are not ensured to be the same on the replicas as on the source if you use 'innodb_autoinc_lock_mode' = 2 ('interleaved') or configurations where the source and replicas do not use the same lock mode.

If you are using row-based or mixed-format replication, all of the auto-increment lock modes are safe, since row-based replication is not sensitive to the order of execution of the SQL statements (and the mixed format uses row-based replication for any statements that are unsafe for statement-based replication).
'Lost' auto-increment values and sequence gaps

In all lock modes (0, 1, and 2), if a transaction that generated auto-increment values rolls back, those auto-increment values are 'lost'. Once a value is generated for an auto-increment column, it cannot be rolled back, whether or not the '*note 'INSERT': insert.-like' statement is completed, and whether or not the containing transaction is rolled back. Such lost values are not reused. Thus, there may be gaps in the values stored in an 'AUTO_INCREMENT' column of a table.
Specifying NULL or 0 for the 'AUTO_INCREMENT' column

In all lock modes (0, 1, and 2), if a user specifies NULL or 0 for the 'AUTO_INCREMENT' column in an *note 'INSERT': insert, 'InnoDB' treats the row as if the value was not specified and generates a new value for it.
Assigning a negative value to the 'AUTO_INCREMENT' column

In all lock modes (0, 1, and 2), the behavior of the auto-increment mechanism is undefined if you assign a negative value to the 'AUTO_INCREMENT' column.
If the 'AUTO_INCREMENT' value becomes larger than the maximum integer for the specified integer type

In all lock modes (0, 1, and 2), the behavior of the auto-increment mechanism is undefined if the value becomes larger than the maximum integer that can be stored in the specified integer type.
Gaps in auto-increment values for 'bulk inserts'

With 'innodb_autoinc_lock_mode' set to 0 ('traditional') or 1 ('consecutive'), the auto-increment values generated by any given statement are consecutive, without gaps, because the table-level 'AUTO-INC' lock is held until the end of the statement, and only one such statement can execute at a time.

With 'innodb_autoinc_lock_mode' set to 2 ('interleaved'), there may be gaps in the auto-increment values generated by 'bulk inserts,' but only if there are concurrently executing '*note 'INSERT': insert.-like' statements.

For lock modes 1 or 2, gaps may occur between successive statements because for bulk inserts the exact number of auto-increment values required by each statement may not be known and overestimation is possible.
Auto-increment values assigned by 'mixed-mode inserts'

Consider a 'mixed-mode insert,' where a 'simple insert' specifies the auto-increment value for some (but not all) resulting rows. Such a statement behaves differently in lock modes 0, 1, and 2. For example, assume 'c1' is an 'AUTO_INCREMENT' column of table 't1', and that the most recent automatically generated sequence number is 100.
```
  mysql> CREATE TABLE t1 (
      -> c1 INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
      -> c2 CHAR(1)
      -> ) ENGINE = INNODB;
```
Now, consider the following 'mixed-mode insert' statement:
```
  mysql> INSERT INTO t1 (c1,c2) VALUES (1,'a'), (NULL,'b'), (5,'c'), (NULL,'d');
```
With 'innodb_autoinc_lock_mode' set to 0 ('traditional'), the four new rows are:
```
  mysql> SELECT c1, c2 FROM t1 ORDER BY c2;
  +-----+------+
  | c1  | c2   |
  +-----+------+
  |   1 | a    |
  | 101 | b    |
  |   5 | c    |
  | 102 | d    |
  +-----+------+
```
The next available auto-increment value is 103 because the auto-increment values are allocated one at a time, not all at once at the beginning of statement execution. This result is true whether or not there are concurrently executing '*note 'INSERT': insert.-like' statements (of any type).

With 'innodb_autoinc_lock_mode' set to 1 ('consecutive'), the four new rows are also:
```
  mysql> SELECT c1, c2 FROM t1 ORDER BY c2;
  +-----+------+
  | c1  | c2   |
  +-----+------+
  |   1 | a    |
  | 101 | b    |
  |   5 | c    |
  | 102 | d    |
  +-----+------+
```
However, in this case, the next available auto-increment value is 105, not 103 because four auto-increment values are allocated at the time the statement is processed, but only two are used. This result is true whether or not there are concurrently executing '*note 'INSERT': insert.-like' statements (of any type).

With 'innodb_autoinc_lock_mode' set to 2 ('interleaved'), the four new rows are:
```
  mysql> SELECT c1, c2 FROM t1 ORDER BY c2;
  +-----+------+
  | c1  | c2   |
  +-----+------+
  |   1 | a    |
  |   X | b    |
  |   5 | c    |
  |   Y | d    |
  +-----+------+
```
The values of X and Y are unique and larger than any previously generated rows. However, the specific values of X and Y depend on the number of auto-increment values generated by concurrently executing statements.

Finally, consider the following statement, issued when the most-recently generated sequence number is 100:
```
  mysql> INSERT INTO t1 (c1,c2) VALUES (1,'a'), (NULL,'b'), (101,'c'), (NULL,'d');
```
With any 'innodb_autoinc_lock_mode' setting, this statement generates a duplicate-key error 23000 ('Can't write; duplicate key in table') because 101 is allocated for the row '(NULL, 'b')' and insertion of the row '(101, 'c')' fails.
Modifying 'AUTO_INCREMENT' column values in the middle of a sequence of *note 'INSERT': insert. statements

In all lock modes (0, 1, and 2), modifying an 'AUTO_INCREMENT' column value in the middle of a sequence of note 'INSERT': insert. statements could lead to 'Duplicate entry' errors. For example, if you perform an note 'UPDATE': update. operation that changes an 'AUTO_INCREMENT' column value to a value larger than the current maximum auto-increment value, subsequent *note 'INSERT': insert. operations that do not specify an unused auto-increment value could encounter 'Duplicate entry' errors. This behavior is demonstrated in the following example.
```
  mysql> CREATE TABLE t1 (
      -> c1 INT NOT NULL AUTO_INCREMENT,
      -> PRIMARY KEY (c1)
      ->  ) ENGINE = InnoDB;

  mysql> INSERT INTO t1 VALUES(0), (0), (3);

  mysql> SELECT c1 FROM t1;
  +----+
  | c1 |
  +----+
  |  1 |
  |  2 |
  |  3 |
  +----+

  mysql> UPDATE t1 SET c1 = 4 WHERE c1 = 1;

  mysql> SELECT c1 FROM t1;
  +----+
  | c1 |
  +----+
  |  2 |
  |  3 |
  |  4 |
  +----+

  mysql> INSERT INTO t1 VALUES(0);
  ERROR 1062 (23000): Duplicate entry '4' for key 'PRIMARY'
```

InnoDB AUTO_INCREMENT Counter Initialization

This section describes how 'InnoDB' initializes 'AUTO_INCREMENT' counters.

If you specify an 'AUTO_INCREMENT' column for an 'InnoDB' table, the table handle in the 'InnoDB' data dictionary contains a special counter called the auto-increment counter that is used in assigning new values for the column. This counter is stored only in main memory, not on disk.

To initialize an auto-increment counter after a server restart, 'InnoDB' executes the equivalent of the following statement on the first insert into a table containing an 'AUTO_INCREMENT' column.

 SELECT MAX(ai_col) FROM TABLE_NAME FOR UPDATE;

'InnoDB' increments the value retrieved by the statement and assigns it to the column and to the auto-increment counter for the table. By default, the value is incremented by 1. This default can be overridden by the 'auto_increment_increment' configuration setting.

If the table is empty, 'InnoDB' uses the value '1'. This default can be overridden by the 'auto_increment_offset' configuration setting.

If a *note 'SHOW TABLE STATUS': show-table-status. statement examines the table before the auto-increment counter is initialized, 'InnoDB' initializes but does not increment the value. The value is stored for use by later inserts. This initialization uses a normal exclusive-locking read on the table and the lock lasts to the end of the transaction. 'InnoDB' follows the same procedure for initializing the auto-increment counter for a newly created table.

After the auto-increment counter has been initialized, if you do not explicitly specify a value for an 'AUTO_INCREMENT' column, 'InnoDB' increments the counter and assigns the new value to the column. If you insert a row that explicitly specifies the column value, and the value is greater than the current counter value, the counter is set to the specified column value.

'InnoDB' uses the in-memory auto-increment counter as long as the server runs. When the server is stopped and restarted, 'InnoDB' reinitializes the counter for each table for the first *note 'INSERT': insert. to the table, as described earlier.

A server restart also cancels the effect of the 'AUTO_INCREMENT = N' table option in note 'CREATE TABLE': create-table. and note 'ALTER TABLE': alter-table. statements, which you can use with 'InnoDB' tables to set the initial counter value or alter the current counter value.

Notes

When an 'AUTO_INCREMENT' integer column runs out of values, a subsequent 'INSERT' operation returns a duplicate-key error. This is general MySQL behavior.
When you restart the MySQL server, 'InnoDB' may reuse an old value that was generated for an 'AUTO_INCREMENT' column but never stored (that is, a value that was generated during an old transaction that was rolled back).

File: manual.info.tmp, Node: innodb-indexes, Next: innodb-tablespace, Prev: innodb-tables, Up: innodb-on-disk-structures

14.6.2 Indexes

Menu:
innodb-index-types:: Clustered and Secondary Indexes
innodb-physical-structure:: The Physical Structure of an InnoDB Index
sorted-index-builds:: Sorted Index Builds
innodb-fulltext-index:: InnoDB Full-Text Indexes

This section covers topics related to 'InnoDB' indexes.

File: manual.info.tmp, Node: innodb-index-types, Next: innodb-physical-structure, Prev: innodb-indexes, Up: innodb-indexes

14.6.2.1 Clustered and Secondary Indexes ........................................

Each 'InnoDB' table has a special index called the clustered index that stores row data. Typically, the clustered index is synonymous with the primary key. To get the best performance from queries, inserts, and other database operations, it is important to understand how 'InnoDB' uses the clustered index to optimize the common lookup and DML operations.

When you define a 'PRIMARY KEY' on a table, 'InnoDB' uses it as the clustered index. A primary key should be defined for each table. If there is no logical unique and non-null column or set of columns to use a the primary key, add an auto-increment column. Auto-increment column values are unique and are added automatically as new rows are inserted.
If you do not define a 'PRIMARY KEY' for a table, 'InnoDB' uses the first 'UNIQUE' index with all key columns defined as 'NOT NULL' as the clustered index.
If a table has no 'PRIMARY KEY' or suitable 'UNIQUE' index, 'InnoDB' generates a hidden clustered index named 'GEN_CLUST_INDEX' on a synthetic column that contains row ID values. The rows are ordered by the row ID that 'InnoDB' assigns. The row ID is a 6-byte field that increases monotonically as new rows are inserted. Thus, the rows ordered by the row ID are physically in order of insertion.

How the Clustered Index Speeds Up Queries

Accessing a row through the clustered index is fast because the index search leads directly to the page that contains the row data. If a table is large, the clustered index architecture often saves a disk I/O operation when compared to storage organizations that store row data using a different page from the index record.

How Secondary Indexes Relate to the Clustered Index

Indexes other than the clustered index are known as secondary indexes. In 'InnoDB', each record in a secondary index contains the primary key columns for the row, as well as the columns specified for the secondary index. 'InnoDB' uses this primary key value to search for the row in the clustered index.

If the primary key is long, the secondary indexes use more space, so it is advantageous to have a short primary key.

For guidelines to take advantage of 'InnoDB' clustered and secondary indexes, see *note optimization-indexes::.

File: manual.info.tmp, Node: innodb-physical-structure, Next: sorted-index-builds, Prev: innodb-index-types, Up: innodb-indexes

14.6.2.2 The Physical Structure of an InnoDB Index ..................................................

With the exception of spatial indexes, 'InnoDB' indexes are B-tree data structures. Spatial indexes use R-trees, which are specialized data structures for indexing multi-dimensional data. Index records are stored in the leaf pages of their B-tree or R-tree data structure. The default size of an index page is 16KB. The page size is determined by the 'innodb_page_size' setting when the MySQL instance is initialized. See *note innodb-init-startup-configuration::.

When new records are inserted into an 'InnoDB' clustered index, 'InnoDB' tries to leave 1/16 of the page free for future insertions and updates of the index records. If index records are inserted in a sequential order (ascending or descending), the resulting index pages are about 15/16 full. If records are inserted in a random order, the pages are from 1/2 to 15/16 full.

'InnoDB' performs a bulk load when creating or rebuilding B-tree indexes. This method of index creation is known as a sorted index build. The 'innodb_fill_factor' variable defines the percentage of space on each B-tree page that is filled during a sorted index build, with the remaining space reserved for future index growth. Sorted index builds are not supported for spatial indexes. For more information, see *note sorted-index-builds::. An 'innodb_fill_factor' setting of 100 leaves 1/16 of the space in clustered index pages free for future index growth.

If the fill factor of an 'InnoDB' index page drops below the 'MERGE_THRESHOLD', which is 50% by default if not specified, 'InnoDB' tries to contract the index tree to free the page. The 'MERGE_THRESHOLD' setting applies to both B-tree and R-tree indexes. For more information, see *note index-page-merge-threshold::.

File: manual.info.tmp, Node: sorted-index-builds, Next: innodb-fulltext-index, Prev: innodb-physical-structure, Up: innodb-indexes

14.6.2.3 Sorted Index Builds ............................

'InnoDB' performs a bulk load instead of inserting one index record at a time when creating or rebuilding indexes. This method of index creation is also known as a sorted index build. Sorted index builds are not supported for spatial indexes.

There are three phases to an index build. In the first phase, the clustered index is scanned, and index entries are generated and added to the sort buffer. When the sort buffer becomes full, entries are sorted and written out to a temporary intermediate file. This process is also known as a 'run'. In the second phase, with one or more runs written to the temporary intermediate file, a merge sort is performed on all entries in the file. In the third and final phase, the sorted entries are inserted into the B-tree.

Prior to the introduction of sorted index builds, index entries were inserted into the B-tree one record at a time using insert APIs. This method involved opening a B-tree cursor to find the insert position and then inserting entries into a B-tree page using an optimistic insert. If an insert failed due to a page being full, a pessimistic insert would be performed, which involves opening a B-tree cursor and splitting and merging B-tree nodes as necessary to find space for the entry. The drawbacks of this 'top-down' method of building an index are the cost of searching for an insert position and the constant splitting and merging of B-tree nodes.

Sorted index builds use a 'bottom-up' approach to building an index. With this approach, a reference to the right-most leaf page is held at all levels of the B-tree. The right-most leaf page at the necessary B-tree depth is allocated and entries are inserted according to their sorted order. Once a leaf page is full, a node pointer is appended to the parent page and a sibling leaf page is allocated for the next insert. This process continues until all entries are inserted, which may result in inserts up to the root level. When a sibling page is allocated, the reference to the previously pinned leaf page is released, and the newly allocated leaf page becomes the right-most leaf page and new default insert location.

Reserving B-tree Page Space for Future Index Growth

To set aside space for future index growth, you can use the 'innodb_fill_factor' variable to reserve a percentage of B-tree page space. For example, setting 'innodb_fill_factor' to 80 reserves 20 percent of the space in B-tree pages during a sorted index build. This setting applies to both B-tree leaf and non-leaf pages. It does not apply to external pages used for note 'TEXT': blob. or note 'BLOB': blob. entries. The amount of space that is reserved may not be exactly as configured, as the 'innodb_fill_factor' value is interpreted as a hint rather than a hard limit.

Sorted Index Builds and Full-Text Index Support

Sorted index builds are supported for fulltext indexes. Previously, SQL was used to insert entries into a fulltext index.

Sorted Index Builds and Compressed Tables

For compressed tables, the previous index creation method appended entries to both compressed and uncompressed pages. When the modification log (representing free space on the compressed page) became full, the compressed page would be recompressed. If compression failed due to a lack of space, the page would be split. With sorted index builds, entries are only appended to uncompressed pages. When an uncompressed page becomes full, it is compressed. Adaptive padding is used to ensure that compression succeeds in most cases, but if compression fails, the page is split and compression is attempted again. This process continues until compression is successful. For more information about compression of B-Tree pages, see *note innodb-compression-internals::.

Sorted Index Builds and Redo Logging

Redo logging is disabled during a sorted index build. Instead, there is a checkpoint to ensure that the index build can withstand an unexpected exit or failure. The checkpoint forces a write of all dirty pages to disk. During a sorted index build, the page cleaner thread is signaled periodically to flush dirty pages to ensure that the checkpoint operation can be processed quickly. Normally, the page cleaner thread flushes dirty pages when the number of clean pages falls below a set threshold. For sorted index builds, dirty pages are flushed promptly to reduce checkpoint overhead and to parallelize I/O and CPU activity.

Sorted Index Builds and Optimizer Statistics

Sorted index builds may result in optimizer statistics that differ from those generated by the previous method of index creation. The difference in statistics, which is not expected to affect workload performance, is due to the different algorithm used to populate the index.

File: manual.info.tmp, Node: innodb-fulltext-index, Prev: sorted-index-builds, Up: innodb-indexes

14.6.2.4 InnoDB Full-Text Indexes .................................

Full-text indexes are created on text-based columns (note 'CHAR': char, note 'VARCHAR': char, or *note 'TEXT': blob. columns) to speed up queries and DML operations on data contained within those columns.

A full-text index is defined as part of a note 'CREATE TABLE': create-table. statement or added to an existing table using note 'ALTER TABLE': alter-table. or *note 'CREATE INDEX': create-index.

Full-text search is performed using 'MATCH() ... AGAINST' syntax. For usage information, see *note fulltext-search::.

'InnoDB' full-text indexes are described under the following topics in this section:

*note innodb-fulltext-index-design::
*note innodb-fulltext-index-tables::
*note innodb-fulltext-index-cache::
*note innodb-fulltext-index-docid::
*note innodb-fulltext-index-deletion::
*note innodb-fulltext-index-transaction::
*note innodb-fulltext-index-monitoring::

InnoDB Full-Text Index Design

'InnoDB' full-text indexes have an inverted index design. Inverted indexes store a list of words, and for each word, a list of documents that the word appears in. To support proximity search, position information for each word is also stored, as a byte offset.

InnoDB Full-Text Index Tables

When an 'InnoDB' full-text index is created, a set of index tables is created, as shown in the following example:

 mysql> CREATE TABLE opening_lines (
        id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
        opening_line TEXT(500),
        author VARCHAR(200),
        title VARCHAR(200),
        FULLTEXT idx (opening_line)
        ) ENGINE=InnoDB;

 mysql> SELECT table_id, name, space from INFORMATION_SCHEMA.INNODB_SYS_TABLES
        WHERE name LIKE 'test/%';
 +----------+----------------------------------------------------+-------+
 | table_id | name                                               | space |
 +----------+----------------------------------------------------+-------+
 |      333 | test/FTS_0000000000000147_00000000000001c9_INDEX_1 |   289 |
 |      334 | test/FTS_0000000000000147_00000000000001c9_INDEX_2 |   290 |
 |      335 | test/FTS_0000000000000147_00000000000001c9_INDEX_3 |   291 |
 |      336 | test/FTS_0000000000000147_00000000000001c9_INDEX_4 |   292 |
 |      337 | test/FTS_0000000000000147_00000000000001c9_INDEX_5 |   293 |
 |      338 | test/FTS_0000000000000147_00000000000001c9_INDEX_6 |   294 |
 |      330 | test/FTS_0000000000000147_BEING_DELETED            |   286 |
 |      331 | test/FTS_0000000000000147_BEING_DELETED_CACHE      |   287 |
 |      332 | test/FTS_0000000000000147_CONFIG                   |   288 |
 |      328 | test/FTS_0000000000000147_DELETED                  |   284 |
 |      329 | test/FTS_0000000000000147_DELETED_CACHE            |   285 |
 |      327 | test/opening_lines                                 |   283 |
 +----------+----------------------------------------------------+-------+

The first six index tables comprise the inverted index and are referred to as auxiliary index tables. When incoming documents are tokenized, the individual words (also referred to as 'tokens') are inserted into the index tables along with position information and an associated 'DOC_ID'. The words are fully sorted and partitioned among the six index tables based on the character set sort weight of the word's first character.

The inverted index is partitioned into six auxiliary index tables to support parallel index creation. By default, two threads tokenize, sort, and insert words and associated data into the index tables. The number of threads that perform this work is configurable using the 'innodb_ft_sort_pll_degree' variable. Consider increasing the number of threads when creating full-text indexes on large tables.

Auxiliary index table names are prefixed with 'fts_' and postfixed with 'index_#'. Each auxiliary index table is associated with the indexed table by a hex value in the auxiliary index table name that matches the 'table_id' of the indexed table. For example, the 'table_id' of the 'test/opening_lines' table is '327', for which the hex value is 0x147. As shown in the preceding example, the '147' hex value appears in the names of auxiliary index tables that are associated with the 'test/opening_lines' table.

A hex value representing the 'index_id' of the full-text index also appears in auxiliary index table names. For example, in the auxiliary table name 'test/FTS_0000000000000147_00000000000001c9_INDEX_1', the hex value '1c9' has a decimal value of 457. The index defined on the 'opening_lines' table ('idx') can be identified by querying the Information Schema *note 'INNODB_SYS_INDEXES': information-schema-innodb-sys-indexes-table. table for this value (457).

 mysql> SELECT index_id, name, table_id, space from INFORMATION_SCHEMA.INNODB_SYS_INDEXES
        WHERE index_id=457;
 +----------+------+----------+-------+
 | index_id | name | table_id | space |
 +----------+------+----------+-------+
 |      457 | idx  |      327 |   283 |
 +----------+------+----------+-------+

Index tables are stored in their own tablespace if the primary table is created in a file-per-table tablespace. Otherwise, index tables are stored in the tablespace where the indexed table resides.

The other index tables shown in the preceding example are referred to as common index tables and are used for deletion handling and storing the internal state of full-text indexes. Unlike the inverted index tables, which are created for each full-text index, this set of tables is common to all full-text indexes created on a particular table.

Common index tables are retained even if full-text indexes are dropped. When a full-text index is dropped, the 'FTS_DOC_ID' column that was created for the index is retained, as removing the 'FTS_DOC_ID' column would require rebuilding the previously indexed table. Common index tables are required to manage the 'FTS_DOC_ID' column.

'FTS_DELETED' and 'FTS_DELETED_CACHE'

Contain the document IDs (DOC_ID) for documents that are deleted but whose data is not yet removed from the full-text index. The 'FTS_DELETED_CACHE' is the in-memory version of the 'FTS_DELETED' table.
'FTS_BEING_DELETED' and 'FTS_BEING_DELETED_CACHE'

Contain the document IDs (DOC_ID) for documents that are deleted and whose data is currently in the process of being removed from the full-text index. The 'FTS_BEING_DELETED_CACHE' table is the in-memory version of the 'FTS_BEING_DELETED' table.
'FTS_*_CONFIG'

Stores information about the internal state of the full-text index. Most importantly, it stores the 'FTS_SYNCED_DOC_ID', which identifies documents that have been parsed and flushed to disk. In case of crash recovery, 'FTS_SYNCED_DOC_ID' values are used to identify documents that have not been flushed to disk so that the documents can be re-parsed and added back to the full-text index cache. To view the data in this table, query the Information Schema *note 'INNODB_FT_CONFIG': information-schema-innodb-ft-config-table. table.

InnoDB Full-Text Index Cache

When a document is inserted, it is tokenized, and the individual words and associated data are inserted into the full-text index. This process, even for small documents, can result in numerous small insertions into the auxiliary index tables, making concurrent access to these tables a point of contention. To avoid this problem, 'InnoDB' uses a full-text index cache to temporarily cache index table insertions for recently inserted rows. This in-memory cache structure holds insertions until the cache is full and then batch flushes them to disk (to the auxiliary index tables). You can query the Information Schema *note 'INNODB_FT_INDEX_CACHE': information-schema-innodb-ft-index-cache-table. table to view tokenized data for recently inserted rows.

The caching and batch flushing behavior avoids frequent updates to auxiliary index tables, which could result in concurrent access issues during busy insert and update times. The batching technique also avoids multiple insertions for the same word, and minimizes duplicate entries. Instead of flushing each word individually, insertions for the same word are merged and flushed to disk as a single entry, improving insertion efficiency while keeping auxiliary index tables as small as possible.

The 'innodb_ft_cache_size' variable is used to configure the full-text index cache size (on a per-table basis), which affects how often the full-text index cache is flushed. You can also define a global full-text index cache size limit for all tables in a given instance using the 'innodb_ft_total_cache_size' variable.

The full-text index cache stores the same information as auxiliary index tables. However, the full-text index cache only caches tokenized data for recently inserted rows. The data that is already flushed to disk (to the auxiliary index tables) is not brought back into the full-text index cache when queried. The data in auxiliary index tables is queried directly, and results from the auxiliary index tables are merged with results from the full-text index cache before being returned.

InnoDB Full-Text Index DOC_ID and FTS_DOC_ID Column

'InnoDB' uses a unique document identifier referred to as the 'DOC_ID' to map words in the full-text index to document records where the word appears. The mapping requires an 'FTS_DOC_ID' column on the indexed table. If an 'FTS_DOC_ID' column is not defined, 'InnoDB' automatically adds a hidden 'FTS_DOC_ID' column when the full-text index is created. The following example demonstrates this behavior.

The following table definition does not include an 'FTS_DOC_ID' column:

 mysql> CREATE TABLE opening_lines (
        id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
        opening_line TEXT(500),
        author VARCHAR(200),
        title VARCHAR(200)
        ) ENGINE=InnoDB;

When you create a full-text index on the table using 'CREATE FULLTEXT INDEX' syntax, a warning is returned which reports that 'InnoDB' is rebuilding the table to add the 'FTS_DOC_ID' column.

 mysql> CREATE FULLTEXT INDEX idx ON opening_lines(opening_line);
 Query OK, 0 rows affected, 1 warning (0.19 sec)
 Records: 0  Duplicates: 0  Warnings: 1

 mysql> SHOW WARNINGS;
 +---------+------+--------------------------------------------------+
 | Level   | Code | Message                                          |
 +---------+------+--------------------------------------------------+
 | Warning |  124 | InnoDB rebuilding table to add column FTS_DOC_ID |
 +---------+------+--------------------------------------------------+

The same warning is returned when using note 'ALTER TABLE': alter-table. to add a full-text index to a table that does not have an 'FTS_DOC_ID' column. If you create a full-text index at note 'CREATE TABLE': create-table. time and do not specify an 'FTS_DOC_ID' column, 'InnoDB' adds a hidden 'FTS_DOC_ID' column, without warning.

Defining an 'FTS_DOC_ID' column at *note 'CREATE TABLE': create-table. time is less expensive than creating a full-text index on a table that is already loaded with data. If an 'FTS_DOC_ID' column is defined on a table prior to loading data, the table and its indexes do not have to be rebuilt to add the new column. If you are not concerned with 'CREATE FULLTEXT INDEX' performance, leave out the 'FTS_DOC_ID' column to have 'InnoDB' create it for you. 'InnoDB' creates a hidden 'FTS_DOC_ID' column along with a unique index ('FTS_DOC_ID_INDEX') on the 'FTS_DOC_ID' column. If you want to create your own 'FTS_DOC_ID' column, the column must be defined as 'BIGINT UNSIGNED NOT NULL' and named 'FTS_DOC_ID' (all uppercase), as in the following example:

Note:

The 'FTS_DOC_ID' column does not need to be defined as an 'AUTO_INCREMENT' column, but doing so could make loading data easier.

 mysql> CREATE TABLE opening_lines (
        FTS_DOC_ID BIGINT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
        opening_line TEXT(500),
        author VARCHAR(200),
        title VARCHAR(200)
        ) ENGINE=InnoDB;

If you choose to define the 'FTS_DOC_ID' column yourself, you are responsible for managing the column to avoid empty or duplicate values. 'FTS_DOC_ID' values cannot be reused, which means 'FTS_DOC_ID' values must be ever increasing.

Optionally, you can create the required unique 'FTS_DOC_ID_INDEX' (all uppercase) on the 'FTS_DOC_ID' column.

 mysql> CREATE UNIQUE INDEX FTS_DOC_ID_INDEX on opening_lines(FTS_DOC_ID);

If you do not create the 'FTS_DOC_ID_INDEX', 'InnoDB' creates it automatically.

Before MySQL 5.7.13, the permitted gap between the largest used 'FTS_DOC_ID' value and new 'FTS_DOC_ID' value is 10000. In MySQL 5.7.13 and later, the permitted gap is 65535.

To avoid rebuilding the table, the 'FTS_DOC_ID' column is retained when dropping a full-text index.

InnoDB Full-Text Index Deletion Handling

Deleting a record that has a full-text index column could result in numerous small deletions in the auxiliary index tables, making concurrent access to these tables a point of contention. To avoid this problem, the 'DOC_ID' of a deleted document is logged in a special 'FTS_DELETED' table whenever a record is deleted from an indexed table, and the indexed record remains in the full-text index. Before returning query results, information in the 'FTS_DELETED' table is used to filter out deleted 'DOC_ID's. The benefit of this design is that deletions are fast and inexpensive. The drawback is that the size of the index is not immediately reduced after deleting records. To remove full-text index entries for deleted records, run 'OPTIMIZE TABLE' on the indexed table with 'innodb_optimize_fulltext_only=ON' to rebuild the full-text index. For more information, see *note fulltext-optimize::.

InnoDB Full-Text Index Transaction Handling

'InnoDB' full-text indexes have special transaction handling characteristics due its caching and batch processing behavior. Specifically, updates and insertions on a full-text index are processed at transaction commit time, which means that a full-text search can only see committed data. The following example demonstrates this behavior. The full-text search only returns a result after the inserted lines are committed.

 mysql> CREATE TABLE opening_lines (
        id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
        opening_line TEXT(500),
        author VARCHAR(200),
        title VARCHAR(200),
        FULLTEXT idx (opening_line)
        ) ENGINE=InnoDB;

 mysql> BEGIN;

 mysql> INSERT INTO opening_lines(opening_line,author,title) VALUES
        ('Call me Ishmael.','Herman Melville','Moby-Dick'),
        ('A screaming comes across the sky.','Thomas Pynchon','Gravity\'s Rainbow'),
        ('I am an invisible man.','Ralph Ellison','Invisible Man'),
        ('Where now? Who now? When now?','Samuel Beckett','The Unnamable'),
        ('It was love at first sight.','Joseph Heller','Catch-22'),
        ('All this happened, more or less.','Kurt Vonnegut','Slaughterhouse-Five'),
        ('Mrs. Dalloway said she would buy the flowers herself.','Virginia Woolf','Mrs. Dalloway'),
        ('It was a pleasure to burn.','Ray Bradbury','Fahrenheit 451');

 mysql> SELECT COUNT(*) FROM opening_lines WHERE MATCH(opening_line) AGAINST('Ishmael');
 +----------+
 | COUNT(*) |
 +----------+
 |        0 |
 +----------+

 mysql> COMMIT;

 mysql> SELECT COUNT(*) FROM opening_lines WHERE MATCH(opening_line) AGAINST('Ishmael');
 +----------+
 | COUNT(*) |
 +----------+
 |        1 |
 +----------+

Monitoring InnoDB Full-Text Indexes

You can monitor and examine the special text-processing aspects of 'InnoDB' full-text indexes by querying the following 'INFORMATION_SCHEMA' tables:

*note 'INNODB_FT_CONFIG': information-schema-innodb-ft-config-table.
*note 'INNODB_FT_INDEX_TABLE': information-schema-innodb-ft-index-table-table.
*note 'INNODB_FT_INDEX_CACHE': information-schema-innodb-ft-index-cache-table.
*note 'INNODB_FT_DEFAULT_STOPWORD': information-schema-innodb-ft-default-stopword-table.
*note 'INNODB_FT_DELETED': information-schema-innodb-ft-deleted-table.
*note 'INNODB_FT_BEING_DELETED': information-schema-innodb-ft-being-deleted-table.

You can also view basic information for full-text indexes and tables by querying note 'INNODB_SYS_INDEXES': information-schema-innodb-sys-indexes-table. and note 'INNODB_SYS_TABLES': information-schema-innodb-sys-tables-table.

For more information, see *note innodb-information-schema-fulltext_index-tables::.

File: manual.info.tmp, Node: innodb-tablespace, Next: innodb-data-dictionary, Prev: innodb-indexes, Up: innodb-on-disk-structures

14.6.3 Tablespaces

Menu:
innodb-system-tablespace:: The System Tablespace
innodb-file-per-table-tablespaces:: File-Per-Table Tablespaces
general-tablespaces:: General Tablespaces
innodb-undo-tablespaces:: Undo Tablespaces
innodb-temporary-tablespace:: The Temporary Tablespace

This section covers topics related to 'InnoDB' tablespaces.

File: manual.info.tmp, Node: innodb-system-tablespace, Next: innodb-file-per-table-tablespaces, Prev: innodb-tablespace, Up: innodb-tablespace

14.6.3.1 The System Tablespace ..............................

The system tablespace is the storage area for the 'InnoDB' data dictionary, the doublewrite buffer, the change buffer, and undo logs. It may also contain table and index data if tables are created in the system tablespace rather than file-per-table or general tablespaces.

The system tablespace can have one or more data files. By default, a single system tablespace data file, named 'ibdata1', is created in the data directory. The size and number of system tablespace data files is defined by the 'innodb_data_file_path' startup option. For configuration information, see *note innodb-startup-data-file-configuration::.

Additional information about the system tablespace is provided under the following topics in the section:

*note innodb-resize-system-tablespace::
*note innodb-raw-devices::

Resizing the System Tablespace

This section describes how to increase or decrease the size of the system tablespace.

Increasing the Size of the System Tablespace

The easiest way to increase the size of the system tablespace is to configure it to be auto-extending. To do so, specify the 'autoextend' attribute for the last data file in the 'innodb_data_file_path' setting, and restart the server. For example:

 innodb_data_file_path=ibdata1:10M:autoextend

When the 'autoextend' attribute is specified, the data file automatically increases in size by 8MB increments as space is required. The 'innodb_autoextend_increment' variable controls the increment size.

You can also increase system tablespace size by adding another data file. To do so:

Stop the MySQL server.
If the last data file in the 'innodb_data_file_path' setting is defined with the 'autoextend' attribute, remove it, and modify the size attribute to reflect the current data file size. To determine the appropriate data file size to specify, check your file system for the file size, and round that value down to the closest MB value, where a MB is equal to 1024 x 1024 bytes.
Append a new data file to the 'innodb_data_file_path' setting, optionally specifying the 'autoextend' attribute. The 'autoextend' attribute can be specified only for the last data file in the 'innodb_data_file_path' setting.
Start the MySQL server.

For example, this tablespace has one auto-extending data file:

 innodb_data_home_dir =
 innodb_data_file_path = /ibdata/ibdata1:10M:autoextend

Suppose that the data file has grown to 988MB over time. This is the 'innodb_data_file_path' setting after modifying the size attribute to reflect the current data file size, and after specifying a new 50MB auto-extending data file:

 innodb_data_home_dir =
 innodb_data_file_path = /ibdata/ibdata1:988M;/disk2/ibdata2:50M:autoextend

When adding a new data file, do not specify an existing file name. 'InnoDB' creates and initializes the new data file when you start the server.

Note:

You cannot increase the size of an existing system tablespace data file by changing its size attribute. For example, changing the 'innodb_data_file_path' setting from 'ibdata1:10M:autoextend' to 'ibdata1:12M:autoextend' produces the following error when starting the server:

 [ERROR] [MY-012263] [InnoDB] The Auto-extending innodb_system
 data file './ibdata1' is of a different size 640 pages (rounded down to MB) than
 specified in the .cnf file: initial 768 pages, max 0 (relevant if non-zero) pages!

The error indicates that the existing data file size (expressed in 'InnoDB' pages) is different from the data file size specified in the configuration file. If you encounter this error, restore the previous 'innodb_data_file_path' setting, and refer to the system tablespace resizing instructions.

Decreasing the Size of the InnoDB System Tablespace

You cannot remove a data file from the system tablespace. To decrease the system tablespace size, use this procedure:

Use *note 'mysqldump': mysqldump. to dump all of your 'InnoDB' tables, including 'InnoDB' tables located in the 'mysql' schema. Identify 'InnoDB' tables in the 'mysql' schema using the following query:

  mysql> SELECT TABLE_NAME from INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA='mysql' and ENGINE='InnoDB';
  +---------------------------+
  | TABLE_NAME                |
  +---------------------------+
  | engine_cost               |
  | gtid_executed             |
  | help_category             |
  | help_keyword              |
  | help_relation             |
  | help_topic                |
  | innodb_index_stats        |
  | innodb_table_stats        |
  | plugin                    |
  | server_cost               |
  | servers                   |
  | slave_master_info         |
  | slave_relay_log_info      |
  | slave_worker_info         |
  | time_zone                 |
  | time_zone_leap_second     |
  | time_zone_name            |
  | time_zone_transition      |
  | time_zone_transition_type |
  +---------------------------+

Stop the server.
Remove all of the existing tablespace files ('.ibd'), including the 'ibdata' and 'ib_log' files. Do not forget to remove '.ibd' files for tables located in the 'mysql' schema.
Remove any '.frm' files for 'InnoDB' tables.
Configure the data files for the new system tablespace. See *note innodb-startup-data-file-configuration::.
Restart the server.
Import the dump files.

Note:

If your databases only use the 'InnoDB' engine, it may be simpler to dump all databases, stop the server, remove all databases and 'InnoDB' log files, restart the server, and import the dump files.

To avoid a large system tablespace, consider using file-per-table tablespaces or general tablespaces for your data. File-per-table tablespaces are the default tablespace type and are used implicitly when creating an 'InnoDB' table. Unlike the system tablespace, file-per-table tablespaces return disk space to the operating system when they are truncated or dropped. For more information, see note innodb-file-per-table-tablespaces::. General tablespaces are multi-table tablespaces that can also be used as an alternative to the system tablespace. See note general-tablespaces::.

Using Raw Disk Partitions for the System Tablespace

Raw disk partitions can be used as system tablespace data files. This technique enables nonbuffered I/O on Windows and some Linux and Unix systems without file system overhead. Perform tests with and without raw partitions to verify whether they improve performance on your system.

When using a raw disk partition, ensure that the user ID that runs the MySQL server has read and write privileges for that partition. For example, if running the server as the 'mysql' user, the partition must be readable and writeable by 'mysql'. If running the server with the '--memlock' option, the server must be run as 'root', so the partition must be readable and writeable by 'root'.

The procedures described below involve option file modification. For additional information, see *note option-files::.

Allocating a Raw Disk Partition on Linux and Unix Systems

To use a raw device for a new server instance, first prepare the configuration file by setting 'innodb_data_file_path' with the 'raw' keyword. For example:
```
  [mysqld]
  innodb_data_home_dir=
  innodb_data_file_path=/dev/hdd1:3Graw;/dev/hdd2:2Graw
```
The partition must be at least as large as the size that you specify. Note that 1MB in 'InnoDB' is 1024 x 1024 bytes, whereas 1MB in disk specifications usually means 1,000,000 bytes.
Then initialize the server for the first time by using '--initialize' or '--initialize-insecure'. InnoDB notices the 'raw' keyword and initializes the new partition, and then it stops the server.
Now restart the server. 'InnoDB' now permits changes to be made.

Allocating a Raw Disk Partition on Windows

On Windows systems, the same steps and accompanying guidelines described for Linux and Unix systems apply except that the 'innodb_data_file_path' setting differs slightly on Windows. For example:

 [mysqld]
 innodb_data_home_dir=
 innodb_data_file_path=//./D::10Graw

The '//./' corresponds to the Windows syntax of '\.' for accessing physical drives. In the example above, 'D:' is the drive letter of the partition.

File: manual.info.tmp, Node: innodb-file-per-table-tablespaces, Next: general-tablespaces, Prev: innodb-system-tablespace, Up: innodb-tablespace

14.6.3.2 File-Per-Table Tablespaces ...................................

A file-per-table tablespace contains data and indexes for a single 'InnoDB' table, and is stored on the file system in a single data file.

File-per-table tablespace characteristics are described under the following topics in this section:

*note innodb-file-per-table-configuration::
*note innodb-file-per-table-data-files::
*note innodb-file-per-table-advantages::
*note innodb-file-per-table-disadvantages::

File-Per-Table Tablespace Configuration

'InnoDB' creates tables in file-per-table tablespaces by default. This behavior is controlled by the 'innodb_file_per_table' variable. Disabling 'innodb_file_per_table' causes 'InnoDB' to create tables in the system tablespace.

An 'innodb_file_per_table' setting can be specified in an option file or configured at runtime using a note 'SET GLOBAL': set-variable. statement. Changing the setting at runtime requires privileges sufficient to set global system variables. See note system-variable-privileges::.

Option file:

 [mysqld]
 innodb_file_per_table=ON

Using *note 'SET GLOBAL': set-variable. at runtime:

 mysql> SET GLOBAL innodb_file_per_table=ON;

'innodb_file_per_table' is enabled by default in MySQL 5.6 and higher. You might consider disabling it if backward compatibility with earlier versions of MySQL is a concern.

Warning:

Disabling 'innodb_file_per_table' prevents table-copying note 'ALTER TABLE': alter-table. operations from implicitly moving a table that resides in the system tablespace to a file-per-table tablespace. A table-copying note 'ALTER TABLE': alter-table. operation recreates the table using the current 'innodb_file_per_table' setting. This behavior does not apply when adding or dropping secondary indexes, nor does it apply to note 'ALTER TABLE': alter-table. operations that use the 'INPLACE' algorithm, or to tables added to the system tablespace using note 'CREATE TABLE ... TABLESPACE': create-table. or *note 'ALTER TABLE ... TABLESPACE': alter-table. syntax.

File-Per-Table Tablespace Data Files

A file-per-table tablespace is created in an '.ibd' data file in a schema directory under the MySQL data directory. The '.ibd' file is named for the table ('TABLE_NAME.ibd'). For example, the data file for table 'test.t1' is created in the 'test' directory under the MySQL data directory:

 mysql> USE test;

 mysql> CREATE TABLE t1 (
    id INT PRIMARY KEY AUTO_INCREMENT,
    name VARCHAR(100)
  ) ENGINE = InnoDB;

 $> cd /PATH/TO/MYSQL/data/test
 $> ls
 t1.ibd

You can use the 'DATA DIRECTORY' clause of the note 'CREATE TABLE': create-table. statement to implicitly create a file-per-table tablespace data file outside of the data directory. For more information, see note innodb-create-table-external::.

File-Per-Table Tablespace Advantages

File-per-table tablespaces have the following advantages over shared tablespaces such as the system tablespace or general tablespaces.

Disk space is returned to the operating system after truncating or dropping a table created in a file-per-table tablespace. Truncating or dropping a table stored in a shared tablespace creates free space within the shared tablespace data file, which can only be used for 'InnoDB' data. In other words, a shared tablespace data file does not shrink in size after a table is truncated or dropped.
A table-copying *note 'ALTER TABLE': alter-table. operation on a table that resides in a shared tablespace can increase the amount of disk space occupied by the tablespace. Such operations may require as much additional space as the data in the table plus indexes. This space is not released back to the operating system as it is for file-per-table tablespaces.
*note 'TRUNCATE TABLE': truncate-table. performance is better when executed on tables that reside in file-per-table tablespaces.
File-per-table tablespace data files can be created on separate storage devices for I/O optimization, space management, or backup purposes. See *note innodb-create-table-external::.
You can import a table that resides in a file-per-table tablespace from another MySQL instance. See *note innodb-table-import::.
Tables created in file-per-table tablespaces use the Barracuda file format. See note innodb-file-format::. The Barracuda file format enables features associated with 'DYNAMIC' and 'COMPRESSED' row formats. See note innodb-row-format::.
Tables stored in individual tablespace data files can save time and improve chances for a successful recovery when data corruption occurs, when backups or binary logs are unavailable, or when the MySQL server instance cannot be restarted.
You can backup or restore tables created in file-per-table tablespaces quickly using MySQL Enterprise Backup, without interrupting the use of other 'InnoDB' tables. This is beneficial for tables on varying backup schedules or that require backup less frequently. See Making a Partial Backup (https://dev.mysql.com/doc/mysql-enterprise-backup/4.1/en/partial.html) for details.
File-per-table tablespaces permit monitoring table size on the file system by monitoring the size of the tablespace data file.
Common Linux file systems do not permit concurrent writes to a single file such as a shared tablespace data file when 'innodb_flush_method' is set to 'O_DIRECT'. As a result, there are possible performance improvements when using file-per-table tablespaces in conjunction with this setting.
Tables in a shared tablespace are limited in size by the 64TB tablespace size limit. By comparison, each file-per-table tablespace has a 64TB size limit, which provides plenty of room for individual tables to grow in size.

File-Per-Table Tablespace Disadvantages

File-per-table tablespaces have the following disadvantages compared to shared tablespaces such as the system tablespace or general tablespaces.

With file-per-table tablespaces, each table may have unused space that can only be utilized by rows of the same table, which can lead to wasted space if not properly managed.
'fsync' operations are performed on multiple file-per-table data files instead of a single shared tablespace data file. Because 'fsync' operations are per file, write operations for multiple tables cannot be combined, which can result in a higher total number of 'fsync' operations.
*note 'mysqld': mysqld. must keep an open file handle for each file-per-table tablespace, which may impact performance if you have numerous tables in file-per-table tablespaces.
More file descriptors are required when each table has its own data file.
There is potential for more fragmentation, which can impede *note 'DROP TABLE': drop-table. and table scan performance. However, if fragmentation is managed, file-per-table tablespaces can improve performance for these operations.
The buffer pool is scanned when dropping a table that resides in a file-per-table tablespace, which can take several seconds for large buffer pools. The scan is performed with a broad internal lock, which may delay other operations.
The 'innodb_autoextend_increment' variable, which defines the increment size for extending the size of an auto-extending shared tablespace file when it becomes full, does not apply to file-per-table tablespace files, which are auto-extending regardless of the 'innodb_autoextend_increment' setting. Initial file-per-table tablespace extensions are by small amounts, after which extensions occur in increments of 4MB.

File: manual.info.tmp, Node: general-tablespaces, Next: innodb-undo-tablespaces, Prev: innodb-file-per-table-tablespaces, Up: innodb-tablespace

14.6.3.3 General Tablespaces ............................

A general tablespace is a shared 'InnoDB' tablespace that is created using *note 'CREATE TABLESPACE': create-tablespace. syntax. General tablespace capabilities and features are described under the following topics in this section:

*note general-tablespaces-capabilities::
*note general-tablespaces-creating::
*note general-tablespaces-adding-tables::
*note general-tablespaces-row-format-support::
*note general-tablespaces-moving-non-partitioned-tables::
*note general-tablespaces-dropping::
*note general-tablespaces-limitations::

General Tablespace Capabilities

General tablespaces provide the following capabilities:

Similar to the system tablespace, general tablespaces are shared tablespaces capable of storing data for multiple tables.
General tablespaces have a potential memory advantage over *note file-per-table tablespaces: innodb-file-per-table-tablespaces. The server keeps tablespace metadata in memory for the lifetime of a tablespace. Multiple tables in fewer general tablespaces consume less memory for tablespace metadata than the same number of tables in separate file-per-table tablespaces.
General tablespace data files can be placed in a directory relative to or independent of the MySQL data directory, which provides you with many of the data file and storage management capabilities of *note file-per-table tablespaces: innodb-file-per-table-tablespaces. As with file-per-table tablespaces, the ability to place data files outside of the MySQL data directory allows you to manage performance of critical tables separately, setup RAID or DRBD for specific tables, or bind tables to particular disks, for example.
General tablespaces support both Antelope and Barracuda file formats, and therefore support all table row formats and associated features. With support for both file formats, general tablespaces have no dependence on 'innodb_file_format' or 'innodb_file_per_table' settings, nor do these variables have any effect on general tablespaces.
The 'TABLESPACE' option can be used with *note 'CREATE TABLE': create-table. to create tables in a general tablespaces, file-per-table tablespace, or in the system tablespace.
The 'TABLESPACE' option can be used with *note 'ALTER TABLE': alter-table. to move tables between general tablespaces, file-per-table tablespaces, and the system tablespace.

Creating a General Tablespace

General tablespaces are created using *note 'CREATE TABLESPACE': create-tablespace. syntax.

 CREATE TABLESPACE TABLESPACE_NAME
     ADD DATAFILE 'FILE_NAME'
     [FILE_BLOCK_SIZE = VALUE]
         [ENGINE [=] ENGINE_NAME]

A general tablespace can be created in the data directory or outside of it. To avoid conflicts with implicitly created file-per-table tablespaces, creating a general tablespace in a subdirectory under the data directory is not supported. When creating a general tablespace outside of the data directory, the directory must exist prior to creating the tablespace.

An .isl file is created in the MySQL data directory when a general tablespace is created outside of the MySQL data directory.

Examples:

Creating a general tablespace in the data directory:

 mysql> CREATE TABLESPACE `ts1` ADD DATAFILE 'ts1.ibd' Engine=InnoDB;

Creating a general tablespace in a directory outside of the data directory:

 mysql> CREATE TABLESPACE `ts1` ADD DATAFILE '/my/tablespace/directory/ts1.ibd' Engine=InnoDB;

You can specify a path that is relative to the data directory as long as the tablespace directory is not under the data directory. In this example, the 'my_tablespace' directory is at the same level as the data directory:

 mysql> CREATE TABLESPACE `ts1` ADD DATAFILE '../my_tablespace/ts1.ibd' Engine=InnoDB;

Note:

The 'ENGINE = InnoDB' clause must be defined as part of the *note 'CREATE TABLESPACE': create-tablespace. statement, or 'InnoDB' must be defined as the default storage engine ('default_storage_engine=InnoDB').

Adding Tables to a General Tablespace

After creating a general tablespace, note 'CREATE TABLE TBL_NAME ... TABLESPACE [=] TABLESPACE_NAME': create-table. or note 'ALTER TABLE TBL_NAME TABLESPACE [=] TABLESPACE_NAME': alter-table. statements can be used to add tables to the tablespace, as shown in the following examples:

*note 'CREATE TABLE': create-table.:

 mysql> CREATE TABLE t1 (c1 INT PRIMARY KEY) TABLESPACE ts1;

*note 'ALTER TABLE': alter-table.:

 mysql> ALTER TABLE t2 TABLESPACE ts1;

Note:

Support for adding table partitions to shared tablespaces was deprecated in MySQL 5.7.24; expect it to be removed in a future version of MySQL. Shared tablespaces include the 'InnoDB' system tablespace and general tablespaces.

For detailed syntax information, see note 'CREATE TABLE': create-table. and note 'ALTER TABLE': alter-table.

General Tablespace Row Format Support

General tablespaces support all table row formats ('REDUNDANT', 'COMPACT', 'DYNAMIC', 'COMPRESSED') with the caveat that compressed and uncompressed tables cannot coexist in the same general tablespace due to different physical page sizes.

For a general tablespace to contain compressed tables ('ROW_FORMAT=COMPRESSED'), the 'FILE_BLOCK_SIZE' option must be specified, and the 'FILE_BLOCK_SIZE' value must be a valid compressed page size in relation to the 'innodb_page_size' value. Also, the physical page size of the compressed table ('KEY_BLOCK_SIZE') must be equal to 'FILE_BLOCK_SIZE/1024'. For example, if 'innodb_page_size=16KB' and 'FILE_BLOCK_SIZE=8K', the 'KEY_BLOCK_SIZE' of the table must be 8.

The following table shows permitted 'innodb_page_size', 'FILE_BLOCK_SIZE', and 'KEY_BLOCK_SIZE' combinations. 'FILE_BLOCK_SIZE' values may also be specified in bytes. To determine a valid 'KEY_BLOCK_SIZE' value for a given 'FILE_BLOCK_SIZE', divide the 'FILE_BLOCK_SIZE' value by 1024. Table compression is not support for 32K and 64K 'InnoDB' page sizes. For more information about 'KEY_BLOCK_SIZE', see note 'CREATE TABLE': create-table, and note innodb-compression-usage::.

Permitted Page Size, FILE_BLOCK_SIZE, and KEY_BLOCK_SIZE Combinations for Compressed Tables

InnoDB Page Size Permitted Permitted (innodb_page_size) FILE_BLOCK_SIZE Value KEY_BLOCK_SIZE Value

64KB

64K (65536) Compression is not supported

32KB

32K (32768) Compression is not supported

16KB

16K (16384) None. If 'innodb_page_size' is equal to 'FILE_BLOCK_SIZE', the tablespace cannot contain a compressed table.

16KB

8K (8192) 8

16KB

4K (4096) 4

16KB

2K (2048) 2

16KB

1K (1024) 1

8KB

8K (8192) None. If 'innodb_page_size' is equal to 'FILE_BLOCK_SIZE', the tablespace cannot contain a compressed table.

8KB

4K (4096) 4

8KB

2K (2048) 2

8KB

1K (1024) 1

4KB

4K (4096) None. If 'innodb_page_size' is equal to 'FILE_BLOCK_SIZE', the tablespace cannot contain a compressed table.

2K (2048) 2

4KB

1K (1024) 1

This example demonstrates creating a general tablespace and adding a compressed table. The example assumes a default 'innodb_page_size' of 16KB. The 'FILE_BLOCK_SIZE' of 8192 requires that the compressed table have a 'KEY_BLOCK_SIZE' of 8.

 mysql> CREATE TABLESPACE `ts2` ADD DATAFILE 'ts2.ibd' FILE_BLOCK_SIZE = 8192 Engine=InnoDB;

 mysql> CREATE TABLE t4 (c1 INT PRIMARY KEY) TABLESPACE ts2 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8;

If you do not specify 'FILE_BLOCK_SIZE' when creating a general tablespace, 'FILE_BLOCK_SIZE' defaults to 'innodb_page_size'. When 'FILE_BLOCK_SIZE' is equal to 'innodb_page_size', the tablespace may only contain tables with an uncompressed row format ('COMPACT', 'REDUNDANT', and 'DYNAMIC' row formats).

Moving Tables Between Tablespaces Using ALTER TABLE

*note 'ALTER TABLE': alter-table. with the 'TABLESPACE' option can be used to move a table to an existing general tablespace, to a new file-per-table tablespace, or to the system tablespace.

Note:

Support for placing table partitions in shared tablespaces was deprecated in MySQL 5.7.24; expect it to be removed in a future version of MySQL. Shared tablespaces include the 'InnoDB' system tablespace and general tablespaces.

To move a table from a file-per-table tablespace or from the system tablespace to a general tablespace, specify the name of the general tablespace. The general tablespace must exist. See *note 'ALTER TABLESPACE': alter-tablespace. for more information.

 ALTER TABLE tbl_name TABLESPACE [=] TABLESPACE_NAME;

To move a table from a general tablespace or file-per-table tablespace to the system tablespace, specify 'innodb_system' as the tablespace name.

 ALTER TABLE tbl_name TABLESPACE [=] innodb_system;

To move a table from the system tablespace or a general tablespace to a file-per-table tablespace, specify 'innodb_file_per_table' as the tablespace name.

 ALTER TABLE tbl_name TABLESPACE [=] innodb_file_per_table;

'ALTER TABLE ... TABLESPACE' operations cause a full table rebuild, even if the 'TABLESPACE' attribute has not changed from its previous value.

'ALTER TABLE ... TABLESPACE' syntax does not support moving a table from a temporary tablespace to a persistent tablespace.

The 'DATA DIRECTORY' clause is permitted with 'CREATE TABLE ... TABLESPACE=innodb_file_per_table' but is otherwise not supported for use in combination with the 'TABLESPACE' option.

Restrictions apply when moving tables from encrypted tablespaces. See *note innodb-data-encryption-limitations::.

Dropping a General Tablespace

The *note 'DROP TABLESPACE': drop-tablespace. statement is used to drop an 'InnoDB' general tablespace.

All tables must be dropped from the tablespace prior to a note 'DROP TABLESPACE': drop-tablespace. operation. If the tablespace is not empty, note 'DROP TABLESPACE': drop-tablespace. returns an error.

Use a query similar to the following to identify tables in a general tablespace.

 mysql> SELECT a.NAME AS space_name, b.NAME AS table_name FROM INFORMATION_SCHEMA.INNODB_TABLESPACES a,
        INFORMATION_SCHEMA.INNODB_TABLES b WHERE a.SPACE=b.SPACE AND a.NAME LIKE 'ts1';
 +------------+------------+
 | space_name | table_name |
 +------------+------------+
 | ts1        | test/t1    |
 | ts1        | test/t2    |
 | ts1        | test/t3    |
 +------------+------------+

If a note 'DROP TABLESPACE': drop-tablespace. operation on an empty general tablespace returns an error, the tablespace may contain an orphan temporary or intermediate table that was left by an note 'ALTER TABLE': alter-table. operation that was interrupted by a server exit. For more information, see *note innodb-troubleshooting-datadict::.

A general 'InnoDB' tablespace is not deleted automatically when the last table in the tablespace is dropped. The tablespace must be dropped explicitly using *note 'DROP TABLESPACE TABLESPACE_NAME': drop-tablespace.

A general tablespace does not belong to any particular database. A note 'DROP DATABASE': drop-database. operation can drop tables that belong to a general tablespace but it cannot drop the tablespace, even if the note 'DROP DATABASE': drop-database. operation drops all tables that belong to the tablespace.

Similar to the system tablespace, truncating or dropping tables stored in a general tablespace creates free space internally in the general tablespace .ibd data file which can only be used for new 'InnoDB' data. Space is not released back to the operating system as it is when a file-per-table tablespace is deleted during a *note 'DROP TABLE': drop-table. operation.

This example demonstrates how to drop an 'InnoDB' general tablespace. The general tablespace 'ts1' is created with a single table. The table must be dropped before dropping the tablespace.

 mysql> CREATE TABLESPACE `ts1` ADD DATAFILE 'ts1.ibd' Engine=InnoDB;

 mysql> CREATE TABLE t1 (c1 INT PRIMARY KEY) TABLESPACE ts1 Engine=InnoDB;

 mysql> DROP TABLE t1;

 mysql> DROP TABLESPACE ts1;

Note:

'TABLESPACE_NAME' is a case-sensitive identifier in MySQL.

General Tablespace Limitations

A generated or existing tablespace cannot be changed to a general tablespace.
Creation of temporary general tablespaces is not supported.
General tablespaces do not support temporary tables.
Tables stored in a general tablespace may only be opened in MySQL releases that support general tablespaces.
Similar to the system tablespace, truncating or dropping tables stored in a general tablespace creates free space internally in the general tablespace .ibd data file which can only be used for new 'InnoDB' data. Space is not released back to the operating system as it is for file-per-table tablespaces.

Additionally, a table-copying note 'ALTER TABLE': alter-table. operation on table that resides in a shared tablespace (a general tablespace or the system tablespace) can increase the amount of space used by the tablespace. Such operations require as much additional space as the data in the table plus indexes. The additional space required for the table-copying note 'ALTER TABLE': alter-table. operation is not released back to the operating system as it is for file-per-table tablespaces.
note 'ALTER TABLE ... DISCARD TABLESPACE': alter-table. and note 'ALTER TABLE ...IMPORT TABLESPACE': alter-table. are not supported for tables that belong to a general tablespace.
Support for placing table partitions in general tablespaces was deprecated in MySQL 5.7.24; expect it to be removed in a future version of MySQL.
The 'ADD DATAFILE' clause is not supported in a replication environment where the source and replica reside on the same host, as it would cause the source and replica to create a tablespace of the same name in the same location.

File: manual.info.tmp, Node: innodb-undo-tablespaces, Next: innodb-temporary-tablespace, Prev: general-tablespaces, Up: innodb-tablespace

14.6.3.4 Undo Tablespaces .........................

Undo tablespaces contain undo logs, which are collections of records containing information about how to undo the latest change by a transaction to a clustered index record.

Undo logs are stored in the system tablespace by default but can be stored in one or more undo tablespaces instead. Using undo tablespaces can reducing the amount of space required for undo logs in any one tablespace. The I/O patterns for undo logs also make undo tablespaces good candidates for SSD storage.

The number of undo tablespaces used by 'InnoDB' is controlled by the 'innodb_undo_tablespaces' option. This option can only be configured when initializing the MySQL instance. It cannot be changed afterward.

Note:

The 'innodb_undo_tablespaces' option is deprecated; expect it to be removed in a future release.

Undo tablespaces and individual segments inside those tablespaces cannot be dropped. However, undo logs stored in undo tablespaces can be truncated. For more information, see *note truncate-undo-tablespace::.

Configuring Undo Tablespaces

This procedure describes how to configure undo tablespaces. When undo tablespaces are configured, undo logs are stored in the undo tablespaces instead of the system tablespace.

The number of undo tablespaces can only be configured when initializing a MySQL instance and is fixed for the life of the instance, so it is recommended that you perform the following procedure on a test instance with a representative workload before deploying the configuration to a production system.

To configure undo tablespaces:

Specify a directory location for undo tablespaces using the 'innodb_undo_directory' variable. If a directory location is not specified, undo tablespaces are created in the data directory.
Define the number of rollback segments using the 'innodb_rollback_segments' variable. Start with a relatively low value and increase it incrementally over time to examine the effect on performance. The default setting for 'innodb_rollback_segments' is 128, which is also the maximum value.
One rollback segment is always assigned to the system tablespace, and 32 rollback segments are reserved for the temporary tablespace ('ibtmp1'). Therefore, to allocate rollback segments to undo tablespaces, set 'innodb_rollback_segments' to a value greater than
1. For example, if you have two undo tablespaces, set 'innodb_rollback_segments' to 35 to assign one rollback segment to each of the two undo tablespaces. Rollback segments are distributed among undo tablespaces in a circular fashion.
When you add undo tablespaces, the rollback segment in the system tablespace is rendered inactive.
Define the number of undo tablespaces using the 'innodb_undo_tablespaces' option. The specified number of undo tablespaces is fixed for the life of the MySQL instance, so if you are uncertain about an optimal value, estimate on the high side.
Create a new MySQL test instance using the configuration settings you have chosen.
Use a realistic workload on your test instance with data volume similar to your production servers to test the configuration.
Benchmark the performance of I/O intensive workloads.
Periodically increase the value of 'innodb_rollback_segments' and rerun performance tests until there are no further improvements in I/O performance.

Truncating Undo Tablespaces

Truncating undo tablespaces requires that the MySQL instance have a minimum of two active undo tablespaces, which ensures that one undo tablespace remains active while the other is taken offline to be truncated. The number of undo tablespaces is defined by the 'innodb_undo_tablespaces' variable. The default value is 0. Use this statement to check the value of 'innodb_undo_tablespaces':

 mysql> SELECT @@innodb_undo_tablespaces;
 +---------------------------+
 | @@innodb_undo_tablespaces |
 +---------------------------+
 |                         2 |
 +---------------------------+

To have undo tablespaces truncated, enable the 'innodb_undo_log_truncate' variable. For example:

 mysql> SET GLOBAL innodb_undo_log_truncate=ON;

When the 'innodb_undo_log_truncate' variable is enabled, undo tablespaces that exceed the size limit defined by the 'innodb_max_undo_log_size' variable are subject to truncation. The 'innodb_max_undo_log_size' variable is dynamic and has a default value of 1073741824 bytes (1024 MiB).

 mysql> SELECT @@innodb_max_undo_log_size;
 +----------------------------+
 | @@innodb_max_undo_log_size |
 +----------------------------+
 |                 1073741824 |
 +----------------------------+

When the 'innodb_undo_log_truncate' variable is enabled:

Undo tablespaces that exceed the 'innodb_max_undo_log_size' setting are marked for truncation. Selection of an undo tablespace for truncation is performed in a circular fashion to avoid truncating the same undo tablespace each time.
Rollback segments residing in the selected undo tablespace are made inactive so that they are not assigned to new transactions. Existing transactions that are currently using rollback segments are permitted to finish.
The purge system empties rollback segments by freeing undo logs that are no longer in use.
After all rollback segments in the undo tablespace are freed, the truncate operation runs and truncates the undo tablespace to its initial size. The initial size of an undo tablespace depends on the 'innodb_page_size' value. For the default 16KB page size, the initial undo tablespace file size is 10MiB. For 4KB, 8KB, 32KB, and 64KB page sizes, the initial undo tablespace files sizes are 7MiB, 8MiB, 20MiB, and 40MiB, respectively.

The size of an undo tablespace after a truncate operation may be larger than the initial size due to immediate use following the completion of the operation.

The 'innodb_undo_directory' variable defines the location of undo tablespace files. If the 'innodb_undo_directory' variable is undefined, undo tablespaces reside in the data directory.
Rollback segments are reactivated so that they can be assigned to new transactions.

Expediting Truncation of Undo Tablespaces

The purge thread is responsible for emptying and truncating undo tablespaces. By default, the purge thread looks for undo tablespaces to truncate once every 128 times that purge is invoked. The frequency with which the purge thread looks for undo tablespaces to truncate is controlled by the 'innodb_purge_rseg_truncate_frequency' variable, which has a default setting of 128.

 mysql> SELECT @@innodb_purge_rseg_truncate_frequency;
 +----------------------------------------+
 | @@innodb_purge_rseg_truncate_frequency |
 +----------------------------------------+
 |                                    128 |
 +----------------------------------------+

To increase the frequency, decrease the 'innodb_purge_rseg_truncate_frequency' setting. For example, to have the purge thread look for undo tabespaces once every 32 timees that purge is invoked, set 'innodb_purge_rseg_truncate_frequency' to 32.

 mysql> SET GLOBAL innodb_purge_rseg_truncate_frequency=32;

When the purge thread finds an undo tablespace that requires truncation, the purge thread returns with increased frequency to quickly empty and truncate the undo tablespace.

Performance Impact of Truncating Undo Tablespace Files

When an undo tablespace is truncated, the rollback segments in the undo tablespace are deactivated. The active rollback segments in other undo tablespaces assume responsibility for the entire system load, which may result in a slight performance degradation. The extent to which performance is affected depends on a number of factors:

Number of undo tablespaces
Number of undo logs
Undo tablespace size
Speed of the I/O susbsystem
Existing long running transactions
System load

The easiest way to avoid the potential performance impact is to increase the number of undo tablespaces.

Also, two checkpoint operations are performed during an undo tablespace truncate operation. The first checkpoint operation removes the old undo tablespace pages from the buffer pool. The second checkpoint flushes the initial pages of the new undo tablespace to disk. On a busy system, the first checkpoint in particular can temporarily affect system performance if there is a large number of pages to remove.

Undo Tablespace Truncation Recovery

An undo tablespace truncate operation creates a temporary 'undo_SPACE_NUMBER_trunc.log' file in the server log directory. That log directory is defined by 'innodb_log_group_home_dir'. If a system failure occurs during the truncate operation, the temporary log file permits the startup process to identify undo tablespaces that were being truncated and to continue the operation.

File: manual.info.tmp, Node: innodb-temporary-tablespace, Prev: innodb-undo-tablespaces, Up: innodb-tablespace

14.6.3.5 The Temporary Tablespace .................................

Non-compressed, user-created temporary tables and on-disk internal temporary tables are created in a shared temporary tablespace. The 'innodb_temp_data_file_path' variable defines the relative path, name, size, and attributes for temporary tablespace data files. If no value is specified for 'innodb_temp_data_file_path', the default behavior is to create an auto-extending data file named 'ibtmp1' in the 'innodb_data_home_dir' directory that is slightly larger than 12MB.

Note:

In MySQL 5.6, non-compressed temporary tables are created in individual file-per-table tablespaces in the temporary file directory, or in the 'InnoDB' system tablespace in the data directory if 'innodb_file_per_table' is disabled. The introduction of a shared temporary tablespace in MySQL 5.7 removes performance costs associated with creating and removing a file-per-table tablespace for each temporary table. A dedicated temporary tablespace also means that it is no longer necessary to save temporary table metadata to the 'InnoDB' system tables.

Compressed temporary tables, which are temporary tables created using the 'ROW_FORMAT=COMPRESSED' attribute, are created in file-per-table tablespaces in the temporary file directory.

The temporary tablespace is removed on normal shutdown or on an aborted initialization, and is recreated each time the server is started. The temporary tablespace receives a dynamically generated space ID when it is created. Startup is refused if the temporary tablespace cannot be created. The temporary tablespace is not removed if the server halts unexpectedly. In this case, a database administrator can remove the temporary tablespace manually or restart the server, which removes and recreates the temporary tablespace automatically.

The temporary tablespace cannot reside on a raw device.

The Information Schema *note 'FILES': information-schema-files-table. table provides metadata about the 'InnoDB' temporary tablespace. Issue a query similar to this one to view temporary tablespace metadata:

 mysql> SELECT * FROM INFORMATION_SCHEMA.FILES WHERE TABLESPACE_NAME='innodb_temporary'\G

The Information Schema *note 'INNODB_TEMP_TABLE_INFO': information-schema-innodb-temp-table-info-table. table provides metadata about user-created temporary tables that are currently active within an 'InnoDB' instance.

Managing Temporary Tablespace Data File Size

By default, the temporary tablespace data file is autoextending and increases in size as necessary to accommodate on-disk temporary tables. For example, if an operation creates a temporary table that is 20MB in size, the temporary tablespace data file, which is 12MB in size by default when created, extends in size to accommodate it. When temporary tables are dropped, freed space can be reused for new temporary tables, but the data file remains at the extended size.

An autoextending temporary tablespace data file can become large in environments that use large temporary tables or that use temporary tables extensively. A large data file can also result from long running queries that use temporary tables.

To determine if a temporary tablespace data file is autoextending, check the 'innodb_temp_data_file_path' setting:

 mysql> SELECT @@innodb_temp_data_file_path;
 +------------------------------+
 | @@innodb_temp_data_file_path |
 +------------------------------+
 | ibtmp1:12M:autoextend        |
 +------------------------------+

To check the size of temporary tablespace data files, query the Information Schema *note 'FILES': information-schema-files-table. table using a query similar to this:

 mysql> SELECT FILE_NAME, TABLESPACE_NAME, ENGINE, INITIAL_SIZE, TOTAL_EXTENTS*EXTENT_SIZE
        AS TotalSizeBytes, DATA_FREE, MAXIMUM_SIZE FROM INFORMATION_SCHEMA.FILES
        WHERE TABLESPACE_NAME = 'innodb_temporary'\G
 *************************** 1. row ***************************
       FILE_NAME: ./ibtmp1
 TABLESPACE_NAME: innodb_temporary
          ENGINE: InnoDB
    INITIAL_SIZE: 12582912
  TotalSizeBytes: 12582912
       DATA_FREE: 6291456
    MAXIMUM_SIZE: NULL

The 'TotalSizeBytes' value reports the current size of the temporary tablespace data file. For information about other field values, see *note information-schema-files-table::.

Alternatively, check the temporary tablespace data file size on your operating system. By default, the temporary tablespace data file is located in the directory defined by the 'innodb_temp_data_file_path' configuration option. If a value was not specified for this option explicitly, a temporary tablespace data file named 'ibtmp1' is created in 'innodb_data_home_dir', which defaults to the MySQL data directory if unspecified.

To reclaim disk space occupied by a temporary tablespace data file, restart the MySQL server. Restarting the server removes and recreates the temporary tablespace data file according to the attributes defined by 'innodb_temp_data_file_path'.

To prevent the temporary data file from becoming too large, you can configure the 'innodb_temp_data_file_path' variable to specify a maximum file size. For example:

 [mysqld]
 innodb_temp_data_file_path=ibtmp1:12M:autoextend:max:500M

When the data file reaches the maximum size, queries fail with an error indicating that the table is full. Configuring 'innodb_temp_data_file_path' requires restarting the server.

Alternatively, configure the 'default_tmp_storage_engine' and 'internal_tmp_disk_storage_engine' variables, which define the storage engine to use for user-created and on-disk internal temporary tables, respectively. Both variables are set to 'InnoDB' by default. The 'MyISAM' storage engine uses an individual file for each temporary table, which is removed when the temporary table is dropped.

File: manual.info.tmp, Node: innodb-data-dictionary, Next: innodb-doublewrite-buffer, Prev: innodb-tablespace, Up: innodb-on-disk-structures

14.6.4 InnoDB Data Dictionary

The 'InnoDB' data dictionary is comprised of internal system tables that contain metadata used to keep track of objects such as tables, indexes, and table columns. The metadata is physically located in the 'InnoDB' system tablespace. For historical reasons, data dictionary metadata overlaps to some degree with information stored in 'InnoDB' table metadata files ('.frm' files).

File: manual.info.tmp, Node: innodb-doublewrite-buffer, Next: innodb-redo-log, Prev: innodb-data-dictionary, Up: innodb-on-disk-structures

14.6.5 Doublewrite Buffer

The doublewrite buffer is a storage area where 'InnoDB' writes pages flushed from the buffer pool before writing the pages to their proper positions in the 'InnoDB' data files. If there is an operating system, storage subsystem, or unexpected *note 'mysqld': mysqld. process exit in the middle of a page write, 'InnoDB' can find a good copy of the page from the doublewrite buffer during crash recovery.

Although data is written twice, the doublewrite buffer does not require twice as much I/O overhead or twice as many I/O operations. Data is written to the doublewrite buffer in a large sequential chunk, with a single 'fsync()' call to the operating system (except in the case that 'innodb_flush_method' is set to 'O_DIRECT_NO_FSYNC').

The doublewrite buffer is enabled by default in most cases. To disable the doublewrite buffer, set 'innodb_doublewrite' to 0.

If system tablespace files ('ibdata files') are located on Fusion-io devices that support atomic writes, doublewrite buffering is automatically disabled and Fusion-io atomic writes are used for all data files. Because the doublewrite buffer setting is global, doublewrite buffering is also disabled for data files residing on non-Fusion-io hardware. This feature is only supported on Fusion-io hardware and is only enabled for Fusion-io NVMFS on Linux. To take full advantage of this feature, an 'innodb_flush_method' setting of 'O_DIRECT' is recommended.

File: manual.info.tmp, Node: innodb-redo-log, Next: innodb-undo-logs, Prev: innodb-doublewrite-buffer, Up: innodb-on-disk-structures

14.6.6 Redo Log

The redo log is a disk-based data structure used during crash recovery to correct data written by incomplete transactions. During normal operations, the redo log encodes requests to change table data that result from SQL statements or low-level API calls. Modifications that did not finish updating the data files before an unexpected shutdown are replayed automatically during initialization, and before connections are accepted. For information about the role of the redo log in crash recovery, see *note innodb-recovery::.

By default, the redo log is physically represented on disk by two files named 'ib_logfile0' and 'ib_logfile1'. MySQL writes to the redo log files in a circular fashion. Data in the redo log is encoded in terms of records affected; this data is collectively referred to as redo. The passage of data through the redo log is represented by an ever-increasing LSN value.

Information and procedures related to redo logs are described under the following topics in the section:

*note innodb-redo-log-file-reconfigure::
*note innodb-redo-log-related-topics::

Changing the Number or Size of InnoDB Redo Log Files

To change the number or the size of your 'InnoDB' redo log files, perform the following steps:

Stop the MySQL server and make sure that it shuts down without errors.
Edit 'my.cnf' to change the log file configuration. To change the log file size, configure 'innodb_log_file_size'. To increase the number of log files, configure 'innodb_log_files_in_group'.
Start the MySQL server again.

If 'InnoDB' detects that the 'innodb_log_file_size' differs from the redo log file size, it writes a log checkpoint, closes and removes the old log files, creates new log files at the requested size, and opens the new log files.

Related Topics

*note innodb-startup-log-file-configuration::
*note optimizing-innodb-logging::

File: manual.info.tmp, Node: innodb-undo-logs, Prev: innodb-redo-log, Up: innodb-on-disk-structures

14.6.7 Undo Logs

An undo log is a collection of undo log records associated with a single read-write transaction. An undo log record contains information about how to undo the latest change by a transaction to a clustered index record. If another transaction needs to see the original data as part of a consistent read operation, the unmodified data is retrieved from undo log records. Undo logs exist within undo log segments, which are contained within rollback segments. Rollback segments reside in the system tablespace, in undo tablespaces, and in the temporary tablespace.

Undo logs that reside in the temporary tablespace are used for transactions that modify data in user-defined temporary tables. These undo logs are not redo-logged, as they are not required for crash recovery. They are used only for rollback while the server is running. This type of undo log benefits performance by avoiding redo logging I/O.

'InnoDB' supports a maximum of 128 rollback segments, 32 of which are allocated to the temporary tablespace. This leaves 96 rollback segments that can be assigned to transactions that modify data in regular tables. The 'innodb_rollback_segments' variable defines the number of rollback segments used by 'InnoDB'.

The number of transactions that a rollback segment supports depends on the number of undo slots in the rollback segment and the number of undo logs required by each transaction. The number of undo slots in a rollback segment differs according to 'InnoDB' page size.

InnoDB Page Size Number of Undo Slots in a Rollback Segment (InnoDB Page Size / 16)

'4096 (4KB)' '256'

'8192 (8KB)' '512'

'16384 (16KB)' '1024'

'32768 (32KB)' '2048'

'65536 (64KB)' '4096'

A transaction is assigned up to four undo logs, one for each of the following operation types:

*note 'INSERT': insert. operations on user-defined tables
note 'UPDATE': update. and note 'DELETE': delete. operations on user-defined tables
*note 'INSERT': insert. operations on user-defined temporary tables
note 'UPDATE': update. and note 'DELETE': delete. operations on user-defined temporary tables

Undo logs are assigned as needed. For example, a transaction that performs note 'INSERT': insert, note 'UPDATE': update, and note 'DELETE': delete. operations on regular and temporary tables requires a full assignment of four undo logs. A transaction that performs only note 'INSERT': insert. operations on regular tables requires a single undo log.

A transaction that performs operations on regular tables is assigned undo logs from an assigned system tablespace or undo tablespace rollback segment. A transaction that performs operations on temporary tables is assigned undo logs from an assigned temporary tablespace rollback segment.

An undo log assigned to a transaction remains attached to the transaction for its duration. For example, an undo log assigned to a transaction for an note 'INSERT': insert. operation on a regular table is used for all note 'INSERT': insert. operations on regular tables performed by that transaction.

Given the factors described above, the following formulas can be used to estimate the number of concurrent read-write transactions that 'InnoDB' is capable of supporting.

Note:

It is possible to encounter a concurrent transaction limit error before reaching the number of concurrent read-write transactions that 'InnoDB' is capable of supporting. This occurs when the rollback segment assigned to a transaction runs out of undo slots. In such cases, try rerunning the transaction.

When transactions perform operations on temporary tables, the number of concurrent read-write transactions that 'InnoDB' is capable of supporting is constrained by the number of rollback segments allocated to the temporary tablespace, which is 32.

If each transaction performs either an note 'INSERT': insert. or an note 'UPDATE': update. or *note 'DELETE': delete. operation, the number of concurrent read-write transactions that 'InnoDB' is capable of supporting is:
```
  (innodb_page_size / 16) * (innodb_rollback_segments - 32)
```
If each transaction performs an note 'INSERT': insert. and* an note 'UPDATE': update. or note 'DELETE': delete. operation, the number of concurrent read-write transactions that 'InnoDB' is capable of supporting is:
```
  (innodb_page_size / 16 / 2) * (innodb_rollback_segments - 32)
```
If each transaction performs an *note 'INSERT': insert. operation on a temporary table, the number of concurrent read-write transactions that 'InnoDB' is capable of supporting is:
```
  (innodb_page_size / 16) * 32
```
If each transaction performs an note 'INSERT': insert. and* an note 'UPDATE': update. or note 'DELETE': delete. operation on a temporary table, the number of concurrent read-write transactions that 'InnoDB' is capable of supporting is:
```
  (innodb_page_size / 16 / 2) * 32
```

File: manual.info.tmp, Node: innodb-locking-transaction-model, Next: innodb-configuration, Prev: innodb-on-disk-structures, Up: innodb-storage-engine