13.2 Data Manipulation Statements

Menu:
call:: CALL Statement
delete:: DELETE Statement
do:: DO Statement
handler:: HANDLER Statement
insert:: INSERT Statement
load-data:: LOAD DATA Statement
load-xml:: LOAD XML Statement
replace:: REPLACE Statement
select:: SELECT Statement
subqueries:: Subqueries
update:: UPDATE Statement

File: manual.info.tmp, Node: call, Next: delete, Prev: sql-data-manipulation-statements, Up: sql-data-manipulation-statements

13.2.1 CALL Statement

 CALL SP_NAME([PARAMETER[,...]])
 CALL SP_NAME[()]

The note 'CALL': call. statement invokes a stored procedure that was defined previously with note 'CREATE PROCEDURE': create-procedure.

Stored procedures that take no arguments can be invoked without parentheses. That is, 'CALL p()' and 'CALL p' are equivalent.

*note 'CALL': call. can pass back values to its caller using parameters that are declared as 'OUT' or 'INOUT' parameters. When the procedure returns, a client program can also obtain the number of rows affected for the final statement executed within the routine: At the SQL level, call the 'ROW_COUNT()' function; from the C API, call the 'mysql_affected_rows()' (https://dev.mysql.com/doc/c-api/5.7/en/mysql-affected-rows.html) function.

For information about the effect of unhandled conditions on procedure parameters, see *note conditions-and-parameters::.

To get back a value from a procedure using an 'OUT' or 'INOUT' parameter, pass the parameter by means of a user variable, and then check the value of the variable after the procedure returns. (If you are calling the procedure from within another stored procedure or function, you can also pass a routine parameter or local routine variable as an 'IN' or 'INOUT' parameter.) For an 'INOUT' parameter, initialize its value before passing it to the procedure. The following procedure has an 'OUT' parameter that the procedure sets to the current server version, and an 'INOUT' value that the procedure increments by one from its current value:

 DELIMITER //

 CREATE PROCEDURE p (OUT ver_param VARCHAR(25), INOUT incr_param INT)
 BEGIN
   # Set value of OUT parameter
   SELECT VERSION() INTO ver_param;
   # Increment value of INOUT parameter
   SET incr_param = incr_param + 1;
 END //

 DELIMITER ;

Before calling the procedure, initialize the variable to be passed as the 'INOUT' parameter. After calling the procedure, the values of the two variables have been set or modified:

 mysql> SET @increment = 10;
 mysql> CALL p(@version, @increment);
 mysql> SELECT @version, @increment;
 +----------+------------+
 | @version | @increment |
 +----------+------------+
 | 5.7.44   |         11 |
 +----------+------------+

In prepared note 'CALL': call. statements used with note 'PREPARE': prepare. and *note 'EXECUTE': execute, placeholders can be used for 'IN' parameters, 'OUT', and 'INOUT' parameters. These types of parameters can be used as follows:

 mysql> SET @increment = 10;
 mysql> PREPARE s FROM 'CALL p(?, ?)';
 mysql> EXECUTE s USING @version, @increment;
 mysql> SELECT @version, @increment;
 +----------+------------+
 | @version | @increment |
 +----------+------------+
 | 5.7.44   |         11 |
 +----------+------------+

To write C programs that use the note 'CALL': call. SQL statement to execute stored procedures that produce result sets, the 'CLIENT_MULTI_RESULTS' flag must be enabled. This is because each note 'CALL': call. returns a result to indicate the call status, in addition to any result sets that might be returned by statements executed within the procedure. 'CLIENT_MULTI_RESULTS' must also be enabled if *note 'CALL': call. is used to execute any stored procedure that contains prepared statements. It cannot be determined when such a procedure is loaded whether those statements produce result sets, so it is necessary to assume that they do so.

'CLIENT_MULTI_RESULTS' can be enabled when you call 'mysql_real_connect()' (https://dev.mysql.com/doc/c-api/5.7/en/mysql-real-connect.html), either explicitly by passing the 'CLIENT_MULTI_RESULTS' flag itself, or implicitly by passing 'CLIENT_MULTI_STATEMENTS' (which also enables 'CLIENT_MULTI_RESULTS'). 'CLIENT_MULTI_RESULTS' is enabled by default.

To process the result of a *note 'CALL': call. statement executed using 'mysql_query()' (https://dev.mysql.com/doc/c-api/5.7/en/mysql-query.html) or 'mysql_real_query()' (https://dev.mysql.com/doc/c-api/5.7/en/mysql-real-query.html), use a loop that calls 'mysql_next_result()' (https://dev.mysql.com/doc/c-api/5.7/en/mysql-next-result.html) to determine whether there are more results. For an example, see Multiple Statement Execution Support (https://dev.mysql.com/doc/c-api/5.7/en/c-api-multiple-queries.html).

C programs can use the prepared-statement interface to execute note 'CALL': call. statements and access 'OUT' and 'INOUT' parameters. This is done by processing the result of a note 'CALL': call. statement using a loop that calls 'mysql_stmt_next_result()' (https://dev.mysql.com/doc/c-api/5.7/en/mysql-stmt-next-result.html) to determine whether there are more results. For an example, see Prepared CALL Statement Support (https://dev.mysql.com/doc/c-api/5.7/en/c-api-prepared-call-statements.html). Languages that provide a MySQL interface can use prepared *note 'CALL': call. statements to directly retrieve 'OUT' and 'INOUT' procedure parameters.

Metadata changes to objects referred to by stored programs are detected and cause automatic reparsing of the affected statements when the program is next executed. For more information, see *note statement-caching::.

File: manual.info.tmp, Node: delete, Next: do, Prev: call, Up: sql-data-manipulation-statements

13.2.2 DELETE Statement

*note 'DELETE': delete. is a DML statement that removes rows from a table.

Single-Table Syntax

 DELETE [LOW_PRIORITY] [QUICK] [IGNORE] FROM TBL_NAME
     [PARTITION (PARTITION_NAME [, PARTITION_NAME] ...)]
     [WHERE WHERE_CONDITION]
     [ORDER BY ...]
     [LIMIT ROW_COUNT]

The 'DELETE' statement deletes rows from TBL_NAME and returns the number of deleted rows. To check the number of deleted rows, call the 'ROW_COUNT()' function described in *note information-functions::.

Main Clauses

The conditions in the optional 'WHERE' clause identify which rows to delete. With no 'WHERE' clause, all rows are deleted.

WHERE_CONDITION is an expression that evaluates to true for each row to be deleted. It is specified as described in *note select::.

If the 'ORDER BY' clause is specified, the rows are deleted in the order that is specified. The 'LIMIT' clause places a limit on the number of rows that can be deleted. These clauses apply to single-table deletes, but not multi-table deletes.

Multiple-Table Syntax

 DELETE [LOW_PRIORITY] [QUICK] [IGNORE]
     TBL_NAME[.*] [, TBL_NAME[.*]] ...
     FROM TABLE_REFERENCES
     [WHERE WHERE_CONDITION]

 DELETE [LOW_PRIORITY] [QUICK] [IGNORE]
     FROM TBL_NAME[.*] [, TBL_NAME[.*]] ...
     USING TABLE_REFERENCES
     [WHERE WHERE_CONDITION]

Privileges

You need the 'DELETE' privilege on a table to delete rows from it. You need only the 'SELECT' privilege for any columns that are only read, such as those named in the 'WHERE' clause.

Performance

When you do not need to know the number of deleted rows, the note 'TRUNCATE TABLE': truncate-table. statement is a faster way to empty a table than a note 'DELETE': delete. statement with no 'WHERE' clause. Unlike note 'DELETE': delete, note 'TRUNCATE TABLE': truncate-table. cannot be used within a transaction or if you have a lock on the table. See note truncate-table:: and note lock-tables::.

The speed of delete operations may also be affected by factors discussed in *note delete-optimization::.

To ensure that a given note 'DELETE': delete. statement does not take too much time, the MySQL-specific 'LIMIT ROW_COUNT' clause for note 'DELETE': delete. specifies the maximum number of rows to be deleted. If the number of rows to delete is larger than the limit, repeat the 'DELETE' statement until the number of affected rows is less than the 'LIMIT' value.

Subqueries

You cannot delete from a table and select from the same table in a subquery.

Partitioned Table Support

'DELETE' supports explicit partition selection using the 'PARTITION' clause, which takes a list of the comma-separated names of one or more partitions or subpartitions (or both) from which to select rows to be dropped. Partitions not included in the list are ignored. Given a partitioned table 't' with a partition named 'p0', executing the statement 'DELETE FROM t PARTITION (p0)' has the same effect on the table as executing *note 'ALTER TABLE t TRUNCATE PARTITION (p0)': alter-table.; in both cases, all rows in partition 'p0' are dropped.

'PARTITION' can be used along with a 'WHERE' condition, in which case the condition is tested only on rows in the listed partitions. For example, 'DELETE FROM t PARTITION (p0) WHERE c < 5' deletes rows only from partition 'p0' for which the condition 'c < 5' is true; rows in any other partitions are not checked and thus not affected by the 'DELETE'.

The 'PARTITION' clause can also be used in multiple-table 'DELETE' statements. You can use up to one such option per table named in the 'FROM' option.

For more information and examples, see *note partitioning-selection::.

Auto-Increment Columns

If you delete the row containing the maximum value for an 'AUTO_INCREMENT' column, the value is not reused for a 'MyISAM' or 'InnoDB' table. If you delete all rows in the table with 'DELETE FROM TBL_NAME' (without a 'WHERE' clause) in 'autocommit' mode, the sequence starts over for all storage engines except 'InnoDB' and 'MyISAM'. There are some exceptions to this behavior for 'InnoDB' tables, as discussed in *note innodb-auto-increment-handling::.

For 'MyISAM' tables, you can specify an 'AUTO_INCREMENT' secondary column in a multiple-column key. In this case, reuse of values deleted from the top of the sequence occurs even for 'MyISAM' tables. See *note example-auto-increment::.

Modifiers

The *note 'DELETE': delete. statement supports the following modifiers:

If you specify the 'LOW_PRIORITY' modifier, the server delays execution of the *note 'DELETE': delete. until no other clients are reading from the table. This affects only storage engines that use only table-level locking (such as 'MyISAM', 'MEMORY', and 'MERGE').
For 'MyISAM' tables, if you use the 'QUICK' modifier, the storage engine does not merge index leaves during delete, which may speed up some kinds of delete operations.
The 'IGNORE' modifier causes MySQL to ignore ignorable errors during the process of deleting rows. (Errors encountered during the parsing stage are processed in the usual manner.) Errors that are ignored due to the use of 'IGNORE' are returned as warnings. For more information, see *note ignore-effect-on-execution::.

Order of Deletion

If the *note 'DELETE': delete. statement includes an 'ORDER BY' clause, rows are deleted in the order specified by the clause. This is useful primarily in conjunction with 'LIMIT'. For example, the following statement finds rows matching the 'WHERE' clause, sorts them by 'timestamp_column', and deletes the first (oldest) one:

 DELETE FROM somelog WHERE user = 'jcole'
 ORDER BY timestamp_column LIMIT 1;

'ORDER BY' also helps to delete rows in an order required to avoid referential integrity violations.

InnoDB Tables

If you are deleting many rows from a large table, you may exceed the lock table size for an 'InnoDB' table. To avoid this problem, or simply to minimize the time that the table remains locked, the following strategy (which does not use *note 'DELETE': delete. at all) might be helpful:

Select the rows not to be deleted into an empty table that has the same structure as the original table:
```
  INSERT INTO t_copy SELECT * FROM t WHERE ... ;
```
Use *note 'RENAME TABLE': rename-table. to atomically move the original table out of the way and rename the copy to the original name:
```
  RENAME TABLE t TO t_old, t_copy TO t;
```
Drop the original table:
```
  DROP TABLE t_old;
```

No other sessions can access the tables involved while note 'RENAME TABLE': rename-table. executes, so the rename operation is not subject to concurrency problems. See note rename-table::.

MyISAM Tables

In 'MyISAM' tables, deleted rows are maintained in a linked list and subsequent note 'INSERT': insert. operations reuse old row positions. To reclaim unused space and reduce file sizes, use the note 'OPTIMIZE TABLE': optimize-table. statement or the note 'myisamchk': myisamchk. utility to reorganize tables. note 'OPTIMIZE TABLE': optimize-table. is easier to use, but note 'myisamchk': myisamchk. is faster. See note optimize-table::, and *note myisamchk::.

The 'QUICK' modifier affects whether index leaves are merged for delete operations. 'DELETE QUICK' is most useful for applications where index values for deleted rows are replaced by similar index values from rows inserted later. In this case, the holes left by deleted values are reused.

'DELETE QUICK' is not useful when deleted values lead to underfilled index blocks spanning a range of index values for which new inserts occur again. In this case, use of 'QUICK' can lead to wasted space in the index that remains unreclaimed. Here is an example of such a scenario:

Create a table that contains an indexed 'AUTO_INCREMENT' column.
Insert many rows into the table. Each insert results in an index value that is added to the high end of the index.
Delete a block of rows at the low end of the column range using 'DELETE QUICK'.

In this scenario, the index blocks associated with the deleted index values become underfilled but are not merged with other index blocks due to the use of 'QUICK'. They remain underfilled when new inserts occur, because new rows do not have index values in the deleted range. Furthermore, they remain underfilled even if you later use note 'DELETE': delete. without 'QUICK', unless some of the deleted index values happen to lie in index blocks within or adjacent to the underfilled blocks. To reclaim unused index space under these circumstances, use note 'OPTIMIZE TABLE': optimize-table.

If you are going to delete many rows from a table, it might be faster to use 'DELETE QUICK' followed by *note 'OPTIMIZE TABLE': optimize-table. This rebuilds the index rather than performing many index block merge operations.

Multi-Table Deletes

You can specify multiple tables in a note 'DELETE': delete. statement to delete rows from one or more tables depending on the condition in the 'WHERE' clause. You cannot use 'ORDER BY' or 'LIMIT' in a multiple-table 'DELETE'. The TABLE_REFERENCES clause lists the tables involved in the join, as described in note join::.

For the first multiple-table syntax, only matching rows from the tables listed before the 'FROM' clause are deleted. For the second multiple-table syntax, only matching rows from the tables listed in the 'FROM' clause (before the 'USING' clause) are deleted. The effect is that you can delete rows from many tables at the same time and have additional tables that are used only for searching:

 DELETE t1, t2 FROM t1 INNER JOIN t2 INNER JOIN t3
 WHERE t1.id=t2.id AND t2.id=t3.id;

Or:

 DELETE FROM t1, t2 USING t1 INNER JOIN t2 INNER JOIN t3
 WHERE t1.id=t2.id AND t2.id=t3.id;

These statements use all three tables when searching for rows to delete, but delete matching rows only from tables 't1' and 't2'.

The preceding examples use 'INNER JOIN', but multiple-table note 'DELETE': delete. statements can use other types of join permitted in note 'SELECT': select. statements, such as 'LEFT JOIN'. For example, to delete rows that exist in 't1' that have no match in 't2', use a 'LEFT JOIN':

 DELETE t1 FROM t1 LEFT JOIN t2 ON t1.id=t2.id WHERE t2.id IS NULL;

The syntax permits '.*' after each TBL_NAME for compatibility with 'Access'.

If you use a multiple-table *note 'DELETE': delete. statement involving 'InnoDB' tables for which there are foreign key constraints, the MySQL optimizer might process tables in an order that differs from that of their parent/child relationship. In this case, the statement fails and rolls back. Instead, you should delete from a single table and rely on the 'ON DELETE' capabilities that 'InnoDB' provides to cause the other tables to be modified accordingly.

Note:

If you declare an alias for a table, you must use the alias when referring to the table:

 DELETE t1 FROM test AS t1, test2 WHERE ...

Table aliases in a multiple-table *note 'DELETE': delete. should be declared only in the TABLE_REFERENCES part of the statement. Elsewhere, alias references are permitted but not alias declarations.

Correct:

 DELETE a1, a2 FROM t1 AS a1 INNER JOIN t2 AS a2
 WHERE a1.id=a2.id;

 DELETE FROM a1, a2 USING t1 AS a1 INNER JOIN t2 AS a2
 WHERE a1.id=a2.id;

Incorrect:

 DELETE t1 AS a1, t2 AS a2 FROM t1 INNER JOIN t2
 WHERE a1.id=a2.id;

 DELETE FROM t1 AS a1, t2 AS a2 USING t1 INNER JOIN t2
 WHERE a1.id=a2.id;

File: manual.info.tmp, Node: do, Next: handler, Prev: delete, Up: sql-data-manipulation-statements

13.2.3 DO Statement

 DO EXPR [, EXPR] ...

note 'DO': do. executes the expressions but does not return any results. In most respects, note 'DO': do. is shorthand for 'SELECT EXPR, ...', but has the advantage that it is slightly faster when you do not care about the result.

*note 'DO': do. is useful primarily with functions that have side effects, such as 'RELEASE_LOCK()'.

Example: This *note 'SELECT': select. statement pauses, but also produces a result set:

 mysql> SELECT SLEEP(5);
 +----------+
 | SLEEP(5) |
 +----------+
 |        0 |
 +----------+
 1 row in set (5.02 sec)

*note 'DO': do, on the other hand, pauses without producing a result set.:

 mysql> DO SLEEP(5);
 Query OK, 0 rows affected (4.99 sec)

This could be useful, for example in a stored function or trigger, which prohibit statements that produce result sets.

*note 'DO': do. only executes expressions. It cannot be used in all cases where 'SELECT' can be used. For example, 'DO id FROM t1' is invalid because it references a table.

File: manual.info.tmp, Node: handler, Next: insert, Prev: do, Up: sql-data-manipulation-statements

13.2.4 HANDLER Statement

 HANDLER TBL_NAME OPEN [ [AS] ALIAS]

 HANDLER TBL_NAME READ INDEX_NAME { = | <= | >= | < | > } (VALUE1,VALUE2,...)
     [ WHERE WHERE_CONDITION ] [LIMIT ... ]
 HANDLER TBL_NAME READ INDEX_NAME { FIRST | NEXT | PREV | LAST }
     [ WHERE WHERE_CONDITION ] [LIMIT ... ]
 HANDLER TBL_NAME READ { FIRST | NEXT }
     [ WHERE WHERE_CONDITION ] [LIMIT ... ]

 HANDLER TBL_NAME CLOSE

The 'HANDLER' statement provides direct access to table storage engine interfaces. It is available for 'InnoDB' and 'MyISAM' tables.

The 'HANDLER ... OPEN' statement opens a table, making it accessible using subsequent 'HANDLER ... READ' statements. This table object is not shared by other sessions and is not closed until the session calls 'HANDLER ... CLOSE' or the session terminates.

If you open the table using an alias, further references to the open table with other 'HANDLER' statements must use the alias rather than the table name. If you do not use an alias, but open the table using a table name qualified by the database name, further references must use the unqualified table name. For example, for a table opened using 'mydb.mytable', further references must use 'mytable'.

The first 'HANDLER ... READ' syntax fetches a row where the index specified satisfies the given values and the 'WHERE' condition is met. If you have a multiple-column index, specify the index column values as a comma-separated list. Either specify values for all the columns in the index, or specify values for a leftmost prefix of the index columns. Suppose that an index 'my_idx' includes three columns named 'col_a', 'col_b', and 'col_c', in that order. The 'HANDLER' statement can specify values for all three columns in the index, or for the columns in a leftmost prefix. For example:

 HANDLER ... READ my_idx = (col_a_val,col_b_val,col_c_val) ...
 HANDLER ... READ my_idx = (col_a_val,col_b_val) ...
 HANDLER ... READ my_idx = (col_a_val) ...

To employ the 'HANDLER' interface to refer to a table's 'PRIMARY KEY', use the quoted identifier 'PRIMARY':

 HANDLER TBL_NAME READ `PRIMARY` ...

The second 'HANDLER ... READ' syntax fetches a row from the table in index order that matches the 'WHERE' condition.

The third 'HANDLER ... READ' syntax fetches a row from the table in natural row order that matches the 'WHERE' condition. It is faster than 'HANDLER TBL_NAME READ INDEX_NAME' when a full table scan is desired. Natural row order is the order in which rows are stored in a 'MyISAM' table data file. This statement works for 'InnoDB' tables as well, but there is no such concept because there is no separate data file.

Without a 'LIMIT' clause, all forms of 'HANDLER ... READ' fetch a single row if one is available. To return a specific number of rows, include a 'LIMIT' clause. It has the same syntax as for the note 'SELECT': select. statement. See note select::.

'HANDLER ... CLOSE' closes a table that was opened with 'HANDLER ... OPEN'.

There are several reasons to use the 'HANDLER' interface instead of normal *note 'SELECT': select. statements:

'HANDLER' is faster than *note 'SELECT': select.:

* A designated storage engine handler object is allocated for
  the 'HANDLER ... OPEN'.  The object is reused for subsequent
  'HANDLER' statements for that table; it need not be
  reinitialized for each one.

* There is less parsing involved.

* There is no optimizer or query-checking overhead.

* The handler interface does not have to provide a consistent
  look of the data (for example, dirty reads are permitted), so
  the storage engine can use optimizations that *note 'SELECT':
  select. does not normally permit.

'HANDLER' makes it easier to port to MySQL applications that use a low-level 'ISAM'-like interface. (See *note innodb-memcached:: for an alternative way to adapt applications that use the key-value store paradigm.)
'HANDLER' enables you to traverse a database in a manner that is difficult (or even impossible) to accomplish with *note 'SELECT': select. The 'HANDLER' interface is a more natural way to look at data when working with applications that provide an interactive user interface to the database.

'HANDLER' is a somewhat low-level statement. For example, it does not provide consistency. That is, 'HANDLER ... OPEN' does not take a snapshot of the table, and does not lock the table. This means that after a 'HANDLER ... OPEN' statement is issued, table data can be modified (by the current session or other sessions) and these modifications might be only partially visible to 'HANDLER ... NEXT' or 'HANDLER ... PREV' scans.

An open handler can be closed and marked for reopen, in which case the handler loses its position in the table. This occurs when both of the following circumstances are true:

Any session executes 'FLUSH TABLES' or DDL statements on the handler's table.
The session in which the handler is open executes non-'HANDLER' statements that use tables.

note 'TRUNCATE TABLE': truncate-table. for a table closes all handlers for the table that were opened with note 'HANDLER OPEN': handler.

If a table is flushed with 'FLUSH TABLES TBL_NAME WITH READ LOCK' was opened with 'HANDLER', the handler is implicitly flushed and loses its position.

File: manual.info.tmp, Node: insert, Next: load-data, Prev: handler, Up: sql-data-manipulation-statements

13.2.5 INSERT Statement

Menu:
insert-select:: INSERT ... SELECT Statement
insert-on-duplicate:: INSERT ... ON DUPLICATE KEY UPDATE Statement
insert-delayed:: INSERT DELAYED Statement

INSERT [LOW_PRIORITY | DELAYED | HIGH_PRIORITY] [IGNORE] [INTO] TBL_NAME [PARTITION (PARTITION_NAME [, PARTITION_NAME] ...)][(COL_NAME [, COL_NAME] ...)] {VALUES | VALUE} (VALUE_LIST) [, (VALUE_LIST)] ... [ON DUPLICATE KEY UPDATE ASSIGNMENT_LIST]

INSERT [LOW_PRIORITY | DELAYED | HIGH_PRIORITY] [IGNORE] [INTO] TBL_NAME [PARTITION (PARTITION_NAME [, PARTITION_NAME] ...)] SET ASSIGNMENT_LIST [ON DUPLICATE KEY UPDATE ASSIGNMENT_LIST]

INSERT [LOW_PRIORITY | HIGH_PRIORITY] [IGNORE] [INTO] TBL_NAME [PARTITION (PARTITION_NAME [, PARTITION_NAME] ...)][(COL_NAME [, COL_NAME] ...)] SELECT ... [ON DUPLICATE KEY UPDATE ASSIGNMENT_LIST]

VALUE: {EXPR | DEFAULT}

VALUE_LIST: VALUE [, VALUE] ...

ASSIGNMENT: COL_NAME = VALUE

ASSIGNMENT_LIST: ASSIGNMENT [, ASSIGNMENT] ...

note 'INSERT': insert. inserts new rows into an existing table. The note 'INSERT ... VALUES': insert. and note 'INSERT ... SET': insert. forms of the statement insert rows based on explicitly specified values. The note 'INSERT ... SELECT': insert-select. form inserts rows selected from another table or tables. *note 'INSERT': insert. with an 'ON DUPLICATE KEY UPDATE' clause enables existing rows to be updated if a row to be inserted would cause a duplicate value in a 'UNIQUE' index or 'PRIMARY KEY'.

For additional information about note 'INSERT ... SELECT': insert-select. and note 'INSERT ... ON DUPLICATE KEY UPDATE': insert-on-duplicate, see note insert-select::, and note insert-on-duplicate::.

In MySQL 5.7, the 'DELAYED' keyword is accepted but ignored by the server. For the reasons for this, see *note insert-delayed::,

Inserting into a table requires the 'INSERT' privilege for the table. If the 'ON DUPLICATE KEY UPDATE' clause is used and a duplicate key causes an *note 'UPDATE': update. to be performed instead, the statement requires the 'UPDATE' privilege for the columns to be updated. For columns that are read but not modified you need only the 'SELECT' privilege (such as for a column referenced only on the right hand side of an COL_NAME=EXPR assignment in an 'ON DUPLICATE KEY UPDATE' clause).

When inserting into a partitioned table, you can control which partitions and subpartitions accept new rows. The 'PARTITION' clause takes a list of the comma-separated names of one or more partitions or subpartitions (or both) of the table. If any of the rows to be inserted by a given note 'INSERT': insert. statement do not match one of the partitions listed, the note 'INSERT': insert. statement fails with the error 'Found a row not matching the given partition set'. For more information and examples, see *note partitioning-selection::.

TBL_NAME is the table into which rows should be inserted. Specify the columns for which the statement provides values as follows:

Provide a parenthesized list of comma-separated column names following the table name. In this case, a value for each named column must be provided by the 'VALUES' list or the *note 'SELECT': select. statement.
If you do not specify a list of column names for note 'INSERT ... VALUES': insert. or note 'INSERT ... SELECT': insert-select, values for every column in the table must be provided by the 'VALUES' list or the *note 'SELECT': select. statement. If you do not know the order of the columns in the table, use 'DESCRIBE TBL_NAME' to find out.
A 'SET' clause indicates columns explicitly by name, together with the value to assign each one.

Column values can be given in several ways:

If strict SQL mode is not enabled, any column not explicitly given a value is set to its default (explicit or implicit) value. For example, if you specify a column list that does not name all the columns in the table, unnamed columns are set to their default values. Default value assignment is described in note data-type-defaults::. See also note constraint-invalid-data::.

If strict SQL mode is enabled, an note 'INSERT': insert. statement generates an error if it does not specify an explicit value for every column that has no default value. See note sql-mode::.
If both the column list and the 'VALUES' list are empty, *note 'INSERT': insert. creates a row with each column set to its default value:
```
  INSERT INTO TBL_NAME () VALUES();
```
If strict mode is not enabled, MySQL uses the implicit default value for any column that has no explicitly defined default. If strict mode is enabled, an error occurs if any column has no default value.
Use the keyword 'DEFAULT' to set a column explicitly to its default value. This makes it easier to write *note 'INSERT': insert. statements that assign values to all but a few columns, because it enables you to avoid writing an incomplete 'VALUES' list that does not include a value for each column in the table. Otherwise, you must provide the list of column names corresponding to each value in the 'VALUES' list.
If a generated column is inserted into explicitly, the only permitted value is 'DEFAULT'. For information about generated columns, see *note create-table-generated-columns::.
In expressions, you can use 'DEFAULT(COL_NAME)' to produce the default value for column COL_NAME.
Type conversion of an expression EXPR that provides a column value might occur if the expression data type does not match the column data type. Conversion of a given value can result in different inserted values depending on the column type. For example, inserting the string ''1999.0e-2'' into an note 'INT': integer-types, note 'FLOAT': floating-point-types, note 'DECIMAL(10,6)': fixed-point-types, or note 'YEAR': year. column inserts the value '1999', '19.9921', '19.992100', or '1999', respectively. The value stored in the note 'INT': integer-types. and note 'YEAR': year. columns is '1999' because the string-to-number conversion looks only at as much of the initial part of the string as may be considered a valid integer or year. For the note 'FLOAT': floating-point-types. and note 'DECIMAL': fixed-point-types. columns, the string-to-number conversion considers the entire string a valid numeric value.
An expression EXPR can refer to any column that was set earlier in a value list. For example, you can do this because the value for 'col2' refers to 'col1', which has previously been assigned:
```
  INSERT INTO TBL_NAME (col1,col2) VALUES(15,col1*2);
```
But the following is not legal, because the value for 'col1' refers to 'col2', which is assigned after 'col1':
```
  INSERT INTO TBL_NAME (col1,col2) VALUES(col2*2,15);
```
An exception occurs for columns that contain 'AUTO_INCREMENT' values. Because 'AUTO_INCREMENT' values are generated after other value assignments, any reference to an 'AUTO_INCREMENT' column in the assignment returns a '0'.

*note 'INSERT': insert. statements that use 'VALUES' syntax can insert multiple rows. To do this, include multiple lists of comma-separated column values, with lists enclosed within parentheses and separated by commas. Example:

 INSERT INTO TBL_NAME (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9);

Each values list must contain exactly as many values as are to be inserted per row. The following statement is invalid because it contains one list of nine values, rather than three lists of three values each:

 INSERT INTO TBL_NAME (a,b,c) VALUES(1,2,3,4,5,6,7,8,9);

'VALUE' is a synonym for 'VALUES' in this context. Neither implies anything about the number of values lists, nor about the number of values per list. Either may be used whether there is a single values list or multiple lists, and regardless of the number of values per list.

The affected-rows value for an note 'INSERT': insert. can be obtained using the 'ROW_COUNT()' SQL function or the 'mysql_affected_rows()' (https://dev.mysql.com/doc/c-api/5.7/en/mysql-affected-rows.html) C API function. See note information-functions::, and mysql_affected_rows() (https://dev.mysql.com/doc/c-api/5.7/en/mysql-affected-rows.html).

If you use an note 'INSERT ... VALUES': insert. statement with multiple value lists or note 'INSERT ... SELECT': insert-select, the statement returns an information string in this format:

 Records: N1 Duplicates: N2 Warnings: N3

If you are using the C API, the information string can be obtained by invoking the 'mysql_info()' (https://dev.mysql.com/doc/c-api/5.7/en/mysql-info.html) function. See mysql_info() (https://dev.mysql.com/doc/c-api/5.7/en/mysql-info.html).

'Records' indicates the number of rows processed by the statement. (This is not necessarily the number of rows actually inserted because 'Duplicates' can be nonzero.) 'Duplicates' indicates the number of rows that could not be inserted because they would duplicate some existing unique index value. 'Warnings' indicates the number of attempts to insert column values that were problematic in some way. Warnings can occur under any of the following conditions:

Inserting 'NULL' into a column that has been declared 'NOT NULL'. For multiple-row note 'INSERT': insert. statements or note 'INSERT INTO ... SELECT': insert-select. statements, the column is set to the implicit default value for the column data type. This is '0' for numeric types, the empty string ('''') for string types, and the 'zero' value for date and time types. note 'INSERT INTO ... SELECT': insert-select. statements are handled the same way as multiple-row inserts because the server does not examine the result set from the note 'SELECT': select. to see whether it returns a single row. (For a single-row *note 'INSERT': insert, no warning occurs when 'NULL' is inserted into a 'NOT NULL' column. Instead, the statement fails with an error.)
Setting a numeric column to a value that lies outside the column range. The value is clipped to the closest endpoint of the range.
Assigning a value such as ''10.34 a'' to a numeric column. The trailing nonnumeric text is stripped off and the remaining numeric part is inserted. If the string value has no leading numeric part, the column is set to '0'.
Inserting a string into a string column (note 'CHAR': char, note 'VARCHAR': char, note 'TEXT': blob, or note 'BLOB': blob.) that exceeds the column maximum length. The value is truncated to the column maximum length.
Inserting a value into a date or time column that is illegal for the data type. The column is set to the appropriate zero value for the type.
For note 'INSERT': insert. examples involving 'AUTO_INCREMENT' column values, see note example-auto-increment::.

If *note 'INSERT': insert. inserts a row into a table that has an 'AUTO_INCREMENT' column, you can find the value used for that column by using the 'LAST_INSERT_ID()' SQL function or the 'mysql_insert_id()' (https://dev.mysql.com/doc/c-api/5.7/en/mysql-insert-id.html) C API function.

Note:

These two functions do not always behave identically. The behavior of note 'INSERT': insert. statements with respect to 'AUTO_INCREMENT' columns is discussed further in note information-functions::, and mysql_insert_id() (https://dev.mysql.com/doc/c-api/5.7/en/mysql-insert-id.html).

The *note 'INSERT': insert. statement supports the following modifiers:

If you use the 'LOW_PRIORITY' modifier, execution of the *note 'INSERT': insert. is delayed until no other clients are reading from the table. This includes other clients that began reading while existing clients are reading, and while the 'INSERT LOW_PRIORITY' statement is waiting. It is possible, therefore, for a client that issues an 'INSERT LOW_PRIORITY' statement to wait for a very long time.

'LOW_PRIORITY' affects only storage engines that use only table-level locking (such as 'MyISAM', 'MEMORY', and 'MERGE').

Note:

'LOW_PRIORITY' should normally not be used with 'MyISAM' tables because doing so disables concurrent inserts. See *note concurrent-inserts::.
If you specify 'HIGH_PRIORITY', it overrides the effect of the '--low-priority-updates' option if the server was started with that option. It also causes concurrent inserts not to be used. See *note concurrent-inserts::.

'HIGH_PRIORITY' affects only storage engines that use only table-level locking (such as 'MyISAM', 'MEMORY', and 'MERGE').
If you use the 'IGNORE' modifier, ignorable errors that occur while executing the *note 'INSERT': insert. statement are ignored. For example, without 'IGNORE', a row that duplicates an existing 'UNIQUE' index or 'PRIMARY KEY' value in the table causes a duplicate-key error and the statement is aborted. With 'IGNORE', the row is discarded and no error occurs. Ignored errors generate warnings instead.

'IGNORE' has a similar effect on inserts into partitioned tables where no partition matching a given value is found. Without 'IGNORE', such note 'INSERT': insert. statements are aborted with an error. When note 'INSERT IGNORE': insert. is used, the insert operation fails silently for rows containing the unmatched value, but inserts rows that are matched. For an example, see *note partitioning-list::.

Data conversions that would trigger errors abort the statement if 'IGNORE' is not specified. With 'IGNORE', invalid values are adjusted to the closest values and inserted; warnings are produced but the statement does not abort. You can determine with the 'mysql_info()' (https://dev.mysql.com/doc/c-api/5.7/en/mysql-info.html) C API function how many rows were actually inserted into the table.

For more information, see *note ignore-effect-on-execution::.

You can use note 'REPLACE': replace. instead of note 'INSERT': insert. to overwrite old rows. note 'REPLACE': replace. is the counterpart to note 'INSERT IGNORE': insert. in the treatment of new rows that contain unique key values that duplicate old rows: The new rows replace the old rows rather than being discarded. See *note replace::.
If you specify 'ON DUPLICATE KEY UPDATE', and a row is inserted that would cause a duplicate value in a 'UNIQUE' index or 'PRIMARY KEY', an note 'UPDATE': update. of the old row occurs. The affected-rows value per row is 1 if the row is inserted as a new row, 2 if an existing row is updated, and 0 if an existing row is set to its current values. If you specify the 'CLIENT_FOUND_ROWS' flag to the 'mysql_real_connect()' (https://dev.mysql.com/doc/c-api/5.7/en/mysql-real-connect.html) C API function when connecting to note 'mysqld': mysqld, the affected-rows value is 1 (not 0) if an existing row is set to its current values. See *note insert-on-duplicate::.
note 'INSERT DELAYED': insert-delayed. was deprecated in MySQL 5.6, and is scheduled for eventual removal. In MySQL 5.7, the 'DELAYED' modifier is accepted but ignored. Use 'INSERT' (without 'DELAYED') instead. See note insert-delayed::.

An 'INSERT' statement affecting a partitioned table using a storage engine such as note 'MyISAM': myisam-storage-engine. that employs table-level locks locks only those partitions into which rows are actually inserted. (For storage engines such as note 'InnoDB': innodb-storage-engine. that employ row-level locking, no locking of partitions takes place.) For more information, see *note partitioning-limitations-locking::.

File: manual.info.tmp, Node: insert-select, Next: insert-on-duplicate, Prev: insert, Up: insert

13.2.5.1 INSERT ... SELECT Statement ....................................

 INSERT [LOW_PRIORITY | HIGH_PRIORITY] [IGNORE]
     [INTO] TBL_NAME
     [PARTITION (PARTITION_NAME [, PARTITION_NAME] ...)]
     [(COL_NAME [, COL_NAME] ...)]
     SELECT ...
     [ON DUPLICATE KEY UPDATE ASSIGNMENT_LIST]

 VALUE:
     {EXPR | DEFAULT}

 ASSIGNMENT:
     COL_NAME = VALUE

 ASSIGNMENT_LIST:
     ASSIGNMENT [, ASSIGNMENT] ...

With note 'INSERT ... SELECT': insert-select, you can quickly insert many rows into a table from the result of a note 'SELECT': select. statement, which can select from one or many tables. For example:

 INSERT INTO tbl_temp2 (fld_id)
   SELECT tbl_temp1.fld_order_id
   FROM tbl_temp1 WHERE tbl_temp1.fld_order_id > 100;

The following conditions hold for *note 'INSERT ... SELECT': insert-select. statements:

Specify 'IGNORE' to ignore rows that would cause duplicate-key violations.
The target table of the note 'INSERT': insert. statement may appear in the 'FROM' clause of the note 'SELECT': select. part of the query. However, you cannot insert into a table and select from the same table in a subquery.

When selecting from and inserting into the same table, MySQL creates an internal temporary table to hold the rows from the note 'SELECT': select. and then inserts those rows into the target table. However, you cannot use 'INSERT INTO t ... SELECT ... FROM t' when 't' is a 'TEMPORARY' table, because 'TEMPORARY' tables cannot be referred to twice in the same statement. See note internal-temporary-tables::, and *note temporary-table-problems::.
'AUTO_INCREMENT' columns work as usual.
To ensure that the binary log can be used to re-create the original tables, MySQL does not permit concurrent inserts for note 'INSERT ... SELECT': insert-select. statements (see note concurrent-inserts::).
To avoid ambiguous column reference problems when the note 'SELECT': select. and the note 'INSERT': insert. refer to the same table, provide a unique alias for each table used in the *note 'SELECT': select. part, and qualify column names in that part with the appropriate alias.

You can explicitly select which partitions or subpartitions (or both) of the source or target table (or both) are to be used with a 'PARTITION' clause following the name of the table. When 'PARTITION' is used with the name of the source table in the note 'SELECT': select. portion of the statement, rows are selected only from the partitions or subpartitions named in its partition list. When 'PARTITION' is used with the name of the target table for the note 'INSERT': insert. portion of the statement, it must be possible to insert all rows selected into the partitions or subpartitions named in the partition list following the option. Otherwise, the 'INSERT ... SELECT' statement fails. For more information and examples, see *note partitioning-selection::.

For note 'INSERT ... SELECT': insert-on-duplicate. statements, see note insert-on-duplicate:: for conditions under which the *note 'SELECT': select. columns can be referred to in an 'ON DUPLICATE KEY UPDATE' clause.

The order in which a note 'SELECT': select. statement with no 'ORDER BY' clause returns rows is nondeterministic. This means that, when using replication, there is no guarantee that such a note 'SELECT': select. returns rows in the same order on the source and the replica, which can lead to inconsistencies between them. To prevent this from occurring, always write 'INSERT ... SELECT' statements that are to be replicated using an 'ORDER BY' clause that produces the same row order on the source and the replica. See also *note replication-features-limit::.

Due to this issue, note 'INSERT ... SELECT ON DUPLICATE KEY UPDATE': insert-on-duplicate. and note 'INSERT IGNORE ... SELECT': insert-select. statements are flagged as unsafe for statement-based replication. Such statements produce a warning in the error log when using statement-based mode and are written to the binary log using the row-based format when using 'MIXED' mode. (Bug #11758262, Bug #50439)

13.2.6 LOAD DATA Statement

 LOAD DATA
     [LOW_PRIORITY | CONCURRENT] [LOCAL]
     INFILE 'FILE_NAME'
     [REPLACE | IGNORE]
     INTO TABLE TBL_NAME
     [PARTITION (PARTITION_NAME [, PARTITION_NAME] ...)]
     [CHARACTER SET CHARSET_NAME]
     [{FIELDS | COLUMNS}
         [TERMINATED BY 'STRING']
         [[OPTIONALLY] ENCLOSED BY 'CHAR']
         [ESCAPED BY 'CHAR']
     ]
     [LINES
         [STARTING BY 'STRING']
         [TERMINATED BY 'STRING']
     ]
     [IGNORE NUMBER {LINES | ROWS}]
     [(COL_NAME_OR_USER_VAR
         [, COL_NAME_OR_USER_VAR] ...)]
     [SET COL_NAME={EXPR | DEFAULT}
         [, COL_NAME={EXPR | DEFAULT}] ...]

The *note 'LOAD DATA': load-data. statement reads rows from a text file into a table at a very high speed. The file can be read from the server host or the client host, depending on whether the 'LOCAL' modifier is given. 'LOCAL' also affects data interpretation and error handling.

note 'LOAD DATA': load-data. is the complement of note 'SELECT ... INTO OUTFILE': select-into. (See note select-into::.) To write data from a table to a file, use note 'SELECT ... INTO OUTFILE': select-into. To read the file back into a table, use *note 'LOAD DATA': load-data. The syntax of the 'FIELDS' and 'LINES' clauses is the same for both statements.

The note 'mysqlimport': mysqlimport. utility provides another way to load data files; it operates by sending a note 'LOAD DATA': load-data. statement to the server. See *note mysqlimport::.

For information about the efficiency of note 'INSERT': insert. versus note 'LOAD DATA': load-data. and speeding up note 'LOAD DATA': load-data, see note insert-optimization::.

*note load-data-local::
*note load-data-character-set::
*note load-data-file-location::
*note load-data-security-requirements::
*note load-data-error-handling::
*note load-data-index-handling::
*note load-data-field-line-handling::
*note load-data-column-list::
*note load-data-input-preprocessing::
*note load-data-column-assignments::
*note load-data-partitioning-support::
*note load-data-concurrency::
*note load-data-statement-result-information::
*note load-data-replication::
*note load-data-miscellaneous::

Non-LOCAL Versus LOCAL Operation

The 'LOCAL' modifier affects these aspects of *note 'LOAD DATA': load-data, compared to non-'LOCAL' operation:

It changes the expected location of the input file; see *note load-data-file-location::.
It changes the statement security requirements; see *note load-data-security-requirements::.
Unless 'REPLACE' is also specified, 'LOCAL' has the same effect as the 'IGNORE' modifier on the interpretation of input file contents and error handling; see note load-data-error-handling::, and note load-data-column-assignments::.

'LOCAL' works only if the server and your client both have been configured to permit it. For example, if note 'mysqld': mysqld. was started with the 'local_infile' system variable disabled, 'LOCAL' produces an error. See note load-data-local-security::.

Input File Character Set

The file name must be given as a literal string. On Windows, specify backslashes in path names as forward slashes or doubled backslashes. The server interprets the file name using the character set indicated by the 'character_set_filesystem' system variable.

By default, the server interprets the file contents using the character set indicated by the 'character_set_database' system variable. If the file contents use a character set different from this default, it is a good idea to specify that character set by using the 'CHARACTER SET' clause. A character set of 'binary' specifies 'no conversion.'

*note 'SET NAMES': set-names. and the setting of 'character_set_client' do not affect interpretation of file contents.

note 'LOAD DATA': load-data. interprets all fields in the file as having the same character set, regardless of the data types of the columns into which field values are loaded. For proper interpretation of the file, you must ensure that it was written with the correct character set. For example, if you write a data file with note 'mysqldump -T': mysqldump. or by issuing a note 'SELECT ... INTO OUTFILE': select-into. statement in note 'mysql': mysql, be sure to use a '--default-character-set' option to write output in the character set to be used when the file is loaded with *note 'LOAD DATA': load-data.

Note:

It is not possible to load data files that use the 'ucs2', 'utf16', 'utf16le', or 'utf32' character set.

Input File Location

These rules determine the *note 'LOAD DATA': load-data. input file location:

If 'LOCAL' is not specified, the file must be located on the server host. The server reads the file directly, locating it as follows:

* If the file name is an absolute path name, the server uses it
  as given.

* If the file name is a relative path name with leading
  components, the server looks for the file relative to its data
  directory.

* If the file name has no leading components, the server looks
  for the file in the database directory of the default
  database.

If 'LOCAL' is specified, the file must be located on the client host. The client program reads the file, locating it as follows:
```
* If the file name is an absolute path name, the client program
  uses it as given.

* If the file name is a relative path name, the client program
  looks for the file relative to its invocation directory.
```
When 'LOCAL' is used, the client program reads the file and sends its contents to the server. The server creates a copy of the file in the directory where it stores temporary files. See note temporary-files::. Lack of sufficient space for the copy in this directory can cause the note 'LOAD DATA LOCAL': load-data. statement to fail.

The non-'LOCAL' rules mean that the server reads a file named as './myfile.txt' relative to its data directory, whereas it reads a file named as 'myfile.txt' from the database directory of the default database. For example, if the following *note 'LOAD DATA': load-data. statement is executed while 'db1' is the default database, the server reads the file 'data.txt' from the database directory for 'db1', even though the statement explicitly loads the file into a table in the 'db2' database:

 LOAD DATA INFILE 'data.txt' INTO TABLE db2.my_table;

Security Requirements

For a non-'LOCAL' load operation, the server reads a text file located on the server host, so these security requirements must be satisified:

You must have the 'FILE' privilege. See *note privileges-provided::.

The operation is subject to the 'secure_file_priv' system variable setting:

* If the variable value is a nonempty directory name, the file
  must be located in that directory.

* If the variable value is empty (which is insecure), the file
  need only be readable by the server.

For a 'LOCAL' load operation, the client program reads a text file located on the client host. Because the file contents are sent over the connection by the client to the server, using 'LOCAL' is a bit slower than when the server accesses the file directly. On the other hand, you do not need the 'FILE' privilege, and the file can be located in any directory the client program can access.

Duplicate-Key and Error Handling

The 'REPLACE' and 'IGNORE' modifiers control handling of new (input) rows that duplicate existing table rows on unique key values ('PRIMARY KEY' or 'UNIQUE' index values):

With 'REPLACE', new rows that have the same value as a unique key value in an existing row replace the existing row. See *note replace::.
With 'IGNORE', new rows that duplicate an existing row on a unique key value are discarded. For more information, see *note ignore-effect-on-execution::.

Unless 'REPLACE' is also specified, the 'LOCAL' modifier has the same effect as 'IGNORE'. This occurs because the server has no way to stop transmission of the file in the middle of the operation.

If none of 'REPLACE', 'IGNORE', or 'LOCAL' is specified, an error occurs when a duplicate key value is found, and the rest of the text file is ignored.

In addition to affecting duplicate-key handling as just described, 'IGNORE' and 'LOCAL' also affect error handling:

When neither 'IGNORE' nor 'LOCAL' is specified, data-interpretation errors terminate the operation.
When 'IGNORE'--or 'LOCAL' without 'REPLACE'--is specified, data interpretation errors become warnings and the load operation continues, even if the SQL mode is restrictive. For examples, see *note load-data-column-assignments::.

Index Handling

To ignore foreign key constraints during the load operation, execute a 'SET foreign_key_checks = 0' statement before executing *note 'LOAD DATA': load-data.

If you use note 'LOAD DATA': load-data. on an empty 'MyISAM' table, all nonunique indexes are created in a separate batch (as for note 'REPAIR TABLE': repair-table.). Normally, this makes note 'LOAD DATA': load-data. much faster when you have many indexes. In some extreme cases, you can create the indexes even faster by turning them off with note 'ALTER TABLE ... DISABLE KEYS': alter-table. before loading the file into the table and re-creating the indexes with note 'ALTER TABLE ... ENABLE KEYS': alter-table. after loading the file. See note insert-optimization::.

Field and Line Handling

For both the note 'LOAD DATA': load-data. and note 'SELECT ... INTO OUTFILE': select-into. statements, the syntax of the 'FIELDS' and 'LINES' clauses is the same. Both clauses are optional, but 'FIELDS' must precede 'LINES' if both are specified.

If you specify a 'FIELDS' clause, each of its subclauses ('TERMINATED BY', '[OPTIONALLY] ENCLOSED BY', and 'ESCAPED BY') is also optional, except that you must specify at least one of them. Arguments to these clauses are permitted to contain only ASCII characters.

If you specify no 'FIELDS' or 'LINES' clause, the defaults are the same as if you had written this:

 FIELDS TERMINATED BY '\t' ENCLOSED BY '' ESCAPED BY '\\'
 LINES TERMINATED BY '\n' STARTING BY ''

Backslash is the MySQL escape character within strings in SQL statements. Thus, to specify a literal backslash, you must specify two backslashes for the value to be interpreted as a single backslash. The escape sequences '''' and '''' specify tab and newline characters, respectively.

In other words, the defaults cause *note 'LOAD DATA': load-data. to act as follows when reading input:

Look for line boundaries at newlines.
Do not skip any line prefix.
Break lines into fields at tabs.
Do not expect fields to be enclosed within any quoting characters.
Interpret characters preceded by the escape character '' as escape sequences. For example, '', '', and '\' signify tab, newline, and backslash, respectively. See the discussion of 'FIELDS ESCAPED BY' later for the full list of escape sequences.

Conversely, the defaults cause *note 'SELECT ... INTO OUTFILE': select-into. to act as follows when writing output:

Write tabs between fields.
Do not enclose fields within any quoting characters.
Use '' to escape instances of tab, newline, or '' that occur within field values.
Write newlines at the ends of lines.

Note:

For a text file generated on a Windows system, proper file reading might require 'LINES TERMINATED BY ''' because Windows programs typically use two characters as a line terminator. Some programs, such as 'WordPad', might use '' as a line terminator when writing files. To read such files, use 'LINES TERMINATED BY '''.

If all the input lines have a common prefix that you want to ignore, you can use 'LINES STARTING BY 'PREFIX_STRING'' to skip the prefix and anything before it. If a line does not include the prefix, the entire line is skipped. Suppose that you issue the following statement:

 LOAD DATA INFILE '/tmp/test.txt' INTO TABLE test
   FIELDS TERMINATED BY ','  LINES STARTING BY 'xxx';

If the data file looks like this:

 xxx"abc",1
 something xxx"def",2
 "ghi",3

The resulting rows are '("abc",1)' and '("def",2)'. The third row in the file is skipped because it does not contain the prefix.

The 'IGNORE NUMBER LINES' clause can be used to ignore lines at the start of the file. For example, you can use 'IGNORE 1 LINES' to skip an initial header line containing column names:

 LOAD DATA INFILE '/tmp/test.txt' INTO TABLE test IGNORE 1 LINES;

When you use note 'SELECT ... INTO OUTFILE': select-into. in tandem with note 'LOAD DATA': load-data. to write data from a database into a file and then read the file back into the database later, the field- and line-handling options for both statements must match. Otherwise, note 'LOAD DATA': load-data. does not interpret the contents of the file properly. Suppose that you use note 'SELECT ... INTO OUTFILE': select-into. to write a file with fields delimited by commas:

 SELECT * INTO OUTFILE 'data.txt'
   FIELDS TERMINATED BY ','
   FROM table2;

To read the comma-delimited file, the correct statement is:

 LOAD DATA INFILE 'data.txt' INTO TABLE table2
   FIELDS TERMINATED BY ',';

If instead you tried to read the file with the statement shown following, it would not work because it instructs *note 'LOAD DATA': load-data. to look for tabs between fields:

 LOAD DATA INFILE 'data.txt' INTO TABLE table2
   FIELDS TERMINATED BY '\t';

The likely result is that each input line would be interpreted as a single field.

*note 'LOAD DATA': load-data. can be used to read files obtained from external sources. For example, many programs can export data in comma-separated values (CSV) format, such that lines have fields separated by commas and enclosed within double quotation marks, with an initial line of column names. If the lines in such a file are terminated by carriage return/newline pairs, the statement shown here illustrates the field- and line-handling options you would use to load the file:

 LOAD DATA INFILE 'data.txt' INTO TABLE TBL_NAME
   FIELDS TERMINATED BY ',' ENCLOSED BY '"'
   LINES TERMINATED BY '\r\n'
   IGNORE 1 LINES;

If the input values are not necessarily enclosed within quotation marks, use 'OPTIONALLY' before the 'ENCLOSED BY' option.

Any of the field- or line-handling options can specify an empty string (''''). If not empty, the 'FIELDS [OPTIONALLY] ENCLOSED BY' and 'FIELDS ESCAPED BY' values must be a single character. The 'FIELDS TERMINATED BY', 'LINES STARTING BY', and 'LINES TERMINATED BY' values can be more than one character. For example, to write lines that are terminated by carriage return/linefeed pairs, or to read a file containing such lines, specify a 'LINES TERMINATED BY ''' clause.

To read a file containing jokes that are separated by lines consisting of '%%', you can do this

 CREATE TABLE jokes
   (a INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
   joke TEXT NOT NULL);
 LOAD DATA INFILE '/tmp/jokes.txt' INTO TABLE jokes
   FIELDS TERMINATED BY ''
   LINES TERMINATED BY '\n%%\n' (joke);

'FIELDS [OPTIONALLY] ENCLOSED BY' controls quoting of fields. For output (*note 'SELECT ... INTO OUTFILE': select-into.), if you omit the word 'OPTIONALLY', all fields are enclosed by the 'ENCLOSED BY' character. An example of such output (using a comma as the field delimiter) is shown here:

 "1","a string","100.20"
 "2","a string containing a , comma","102.20"
 "3","a string containing a \" quote","102.20"
 "4","a string containing a \", quote and comma","102.20"

If you specify 'OPTIONALLY', the 'ENCLOSED BY' character is used only to enclose values from columns that have a string data type (such as note 'CHAR': char, note 'BINARY': binary-varbinary, note 'TEXT': blob, or note 'ENUM': enum.):

 1,"a string",100.20
 2,"a string containing a , comma",102.20
 3,"a string containing a \" quote",102.20
 4,"a string containing a \", quote and comma",102.20

Occurrences of the 'ENCLOSED BY' character within a field value are escaped by prefixing them with the 'ESCAPED BY' character. Also, if you specify an empty 'ESCAPED BY' value, it is possible to inadvertently generate output that cannot be read properly by *note 'LOAD DATA': load-data. For example, the preceding output just shown would appear as follows if the escape character is empty. Observe that the second field in the fourth line contains a comma following the quote, which (erroneously) appears to terminate the field:

 1,"a string",100.20
 2,"a string containing a , comma",102.20
 3,"a string containing a " quote",102.20
 4,"a string containing a ", quote and comma",102.20

For input, the 'ENCLOSED BY' character, if present, is stripped from the ends of field values. (This is true regardless of whether 'OPTIONALLY' is specified; 'OPTIONALLY' has no effect on input interpretation.) Occurrences of the 'ENCLOSED BY' character preceded by the 'ESCAPED BY' character are interpreted as part of the current field value.

If the field begins with the 'ENCLOSED BY' character, instances of that character are recognized as terminating a field value only if followed by the field or line 'TERMINATED BY' sequence. To avoid ambiguity, occurrences of the 'ENCLOSED BY' character within a field value can be doubled and are interpreted as a single instance of the character. For example, if 'ENCLOSED BY '"'' is specified, quotation marks are handled as shown here:

 "The ""BIG"" boss"  -> The "BIG" boss
 The "BIG" boss      -> The "BIG" boss
 The ""BIG"" boss    -> The ""BIG"" boss

'FIELDS ESCAPED BY' controls how to read or write special characters:

For input, if the 'FIELDS ESCAPED BY' character is not empty, occurrences of that character are stripped and the following character is taken literally as part of a field value. Some two-character sequences that are exceptions, where the first character is the escape character. These sequences are shown in the following table (using '' for the escape character). The rules for 'NULL' handling are described later in this section.

Character Escape Sequence

'' An ASCII NUL ('X'00'') character

'' A backspace character

'' A newline (linefeed) character

'' A carriage return character

'' A tab character.

'' ASCII 26 (Control+Z)

'' NULL

For more information about ''-escape syntax, see *note string-literals::.

If the 'FIELDS ESCAPED BY' character is empty, escape-sequence interpretation does not occur.
For output, if the 'FIELDS ESCAPED BY' character is not empty, it is used to prefix the following characters on output:
```
* The 'FIELDS ESCAPED BY' character.

* The 'FIELDS [OPTIONALLY] ENCLOSED BY' character.

* The first character of the 'FIELDS TERMINATED BY' and 'LINES
  TERMINATED BY' values, if the 'ENCLOSED BY' character is empty
  or unspecified.

* ASCII '0' (what is actually written following the escape
  character is ASCII '0', not a zero-valued byte).
```
If the 'FIELDS ESCAPED BY' character is empty, no characters are escaped and 'NULL' is output as 'NULL', not ''. It is probably not a good idea to specify an empty escape character, particularly if field values in your data contain any of the characters in the list just given.

In certain cases, field- and line-handling options interact:

If 'LINES TERMINATED BY' is an empty string and 'FIELDS TERMINATED BY' is nonempty, lines are also terminated with 'FIELDS TERMINATED BY'.
If the 'FIELDS TERMINATED BY' and 'FIELDS ENCLOSED BY' values are both empty (''''), a fixed-row (nondelimited) format is used. With fixed-row format, no delimiters are used between fields (but you can still have a line terminator). Instead, column values are read and written using a field width wide enough to hold all values in the field. For note 'TINYINT': integer-types, note 'SMALLINT': integer-types, note 'MEDIUMINT': integer-types, note 'INT': integer-types, and *note 'BIGINT': integer-types, the field widths are 4, 6, 8, 11, and 20, respectively, no matter what the declared display width is.

'LINES TERMINATED BY' is still used to separate lines. If a line does not contain all fields, the rest of the columns are set to their default values. If you do not have a line terminator, you should set this to ''''. In this case, the text file must contain all fields for each row.

Fixed-row format also affects handling of 'NULL' values, as described later.

Note:

Fixed-size format does not work if you are using a multibyte character set.

Handling of 'NULL' values varies according to the 'FIELDS' and 'LINES' options in use:

For the default 'FIELDS' and 'LINES' values, 'NULL' is written as a field value of '' for output, and a field value of '' is read as 'NULL' for input (assuming that the 'ESCAPED BY' character is '').
If 'FIELDS ENCLOSED BY' is not empty, a field containing the literal word 'NULL' as its value is read as a 'NULL' value. This differs from the word 'NULL' enclosed within 'FIELDS ENCLOSED BY' characters, which is read as the string ''NULL''.
If 'FIELDS ESCAPED BY' is empty, 'NULL' is written as the word 'NULL'.
With fixed-row format (which is used when 'FIELDS TERMINATED BY' and 'FIELDS ENCLOSED BY' are both empty), 'NULL' is written as an empty string. This causes both 'NULL' values and empty strings in the table to be indistinguishable when written to the file because both are written as empty strings. If you need to be able to tell the two apart when reading the file back in, you should not use fixed-row format.

An attempt to load 'NULL' into a 'NOT NULL' column produces either a warning or an error according to the rules described in *note load-data-column-assignments::.

Some cases are not supported by *note 'LOAD DATA': load-data.:

Fixed-size rows ('FIELDS TERMINATED BY' and 'FIELDS ENCLOSED BY' both empty) and note 'BLOB': blob. or note 'TEXT': blob. columns.
If you specify one separator that is the same as or a prefix of another, *note 'LOAD DATA': load-data. cannot interpret the input properly. For example, the following 'FIELDS' clause would cause problems:
```
  FIELDS TERMINATED BY '"' ENCLOSED BY '"'
```
If 'FIELDS ESCAPED BY' is empty, a field value that contains an occurrence of 'FIELDS ENCLOSED BY' or 'LINES TERMINATED BY' followed by the 'FIELDS TERMINATED BY' value causes note 'LOAD DATA': load-data. to stop reading a field or line too early. This happens because note 'LOAD DATA': load-data. cannot properly determine where the field or line value ends.

Column List Specification

The following example loads all columns of the 'persondata' table:

 LOAD DATA INFILE 'persondata.txt' INTO TABLE persondata;

By default, when no column list is provided at the end of the *note 'LOAD DATA': load-data. statement, input lines are expected to contain a field for each table column. If you want to load only some of a table's columns, specify a column list:

 LOAD DATA INFILE 'persondata.txt' INTO TABLE persondata
 (COL_NAME_OR_USER_VAR [, COL_NAME_OR_USER_VAR] ...);

You must also specify a column list if the order of the fields in the input file differs from the order of the columns in the table. Otherwise, MySQL cannot tell how to match input fields with table columns.

Input Preprocessing

Each instance of COL_NAME_OR_USER_VAR in *note 'LOAD DATA': load-data. syntax is either a column name or a user variable. With user variables, the 'SET' clause enables you to perform preprocessing transformations on their values before assigning the result to columns.

User variables in the 'SET' clause can be used in several ways. The following example uses the first input column directly for the value of 't1.column1', and assigns the second input column to a user variable that is subjected to a division operation before being used for the value of 't1.column2':

 LOAD DATA INFILE 'file.txt'
   INTO TABLE t1
   (column1, @var1)
   SET column2 = @var1/100;

The 'SET' clause can be used to supply values not derived from the input file. The following statement sets 'column3' to the current date and time:

 LOAD DATA INFILE 'file.txt'
   INTO TABLE t1
   (column1, column2)
   SET column3 = CURRENT_TIMESTAMP;

You can also discard an input value by assigning it to a user variable and not assigning the variable to any table column:

 LOAD DATA INFILE 'file.txt'
   INTO TABLE t1
   (column1, @dummy, column2, @dummy, column3);

Use of the column/variable list and 'SET' clause is subject to the following restrictions:

Assignments in the 'SET' clause should have only column names on the left hand side of assignment operators.
You can use subqueries in the right hand side of 'SET' assignments. A subquery that returns a value to be assigned to a column may be a scalar subquery only. Also, you cannot use a subquery to select from the table that is being loaded.
Lines ignored by an 'IGNORE NUMBER LINES' clause are not processed for the column/variable list or 'SET' clause.
User variables cannot be used when loading data with fixed-row format because user variables do not have a display width.

Column Value Assignment

To process an input line, *note 'LOAD DATA': load-data. splits it into fields and uses the values according to the column/variable list and the 'SET' clause, if they are present. Then the resulting row is inserted into the table. If there are 'BEFORE INSERT' or 'AFTER INSERT' triggers for the table, they are activated before or after inserting the row, respectively.

Interpretation of field values and assignment to table columns depends on these factors:

The SQL mode (the value of the 'sql_mode' system variable). The mode can be nonstrictive, or restrictive in various ways. For example, strict SQL mode can be enabled, or the mode can include values such as 'NO_ZERO_DATE' or 'NO_ZERO_IN_DATE'.
Presence or absence of the 'IGNORE' and 'LOCAL' modifiers.

Those factors combine to produce restrictive or nonrestrictive data interpretation by *note 'LOAD DATA': load-data.:

Data interpretation is restrictive if the SQL mode is restrictive and neither the 'IGNORE' nor the 'LOCAL' modifier is specified. Errors terminate the load operation.
Data interpretation is nonrestrictive if the SQL mode is nonrestrictive or the 'IGNORE' or 'LOCAL' modifier is specified. (In particular, either modifier if specified overrides a restrictive SQL mode.) Errors become warnings and the load operation continues.

Restrictive data interpretation uses these rules:

Too many or too few fields results an error.
Assigning 'NULL' (that is, '') to a non-'NULL' column results in an error.
A value that is out of range for the column data type results in an error.
Invalid values produce errors. For example, a value such as ''x'' for a numeric column results in an error, not conversion to 0.

By contrast, nonrestrictive data interpretation uses these rules:

If an input line has too many fields, the extra fields are ignored and the number of warnings is incremented.
If an input line has too few fields, the columns for which input fields are missing are assigned their default values. Default value assignment is described in *note data-type-defaults::.
Assigning 'NULL' (that is, '') to a non-'NULL' column results in assignment of the implicit default value for the column data type. Implicit default values are described in *note data-type-defaults::.

Invalid values produce warnings rather than errors, and are converted to the 'closest' valid value for the column data type. Examples:

* A value such as ''x'' for a numeric column results in
  conversion to 0.

* An out-of-range numeric or temporal value is clipped to the
  closest endpoint of the range for the column data type.

* An invalid value for a 'DATETIME', 'DATE', or 'TIME' column is
  inserted as the implicit default value, regardless of the SQL
  mode 'NO_ZERO_DATE' setting.  The implicit default is the
  appropriate 'zero' value for the type (''0000-00-00
  00:00:00'', ''0000-00-00'', or ''00:00:00'').  See *note
  date-and-time-types::.

*note 'LOAD DATA': load-data. interprets an empty field value differently from a missing field:
```
* For string types, the column is set to the empty string.

* For numeric types, the column is set to '0'.

* For date and time types, the column is set to the appropriate
  'zero' value for the type.  See *note date-and-time-types::.
```
These are the same values that result if you assign an empty string explicitly to a string, numeric, or date or time type explicitly in an note 'INSERT': insert. or note 'UPDATE': update. statement.

note 'TIMESTAMP': datetime. columns are set to the current date and time only if there is a 'NULL' value for the column (that is, '') and the column is not declared to permit 'NULL' values, or if the note 'TIMESTAMP': datetime. column default value is the current timestamp and it is omitted from the field list when a field list is specified.

note 'LOAD DATA': load-data. regards all input as strings, so you cannot use numeric values for note 'ENUM': enum. or note 'SET': set. columns the way you can with note 'INSERT': insert. statements. All note 'ENUM': enum. and note 'SET': set. values must be specified as strings.

note 'BIT': bit-type. values cannot be loaded directly using binary notation (for example, 'b'011010''). To work around this, use the 'SET' clause to strip off the leading 'b'' and trailing ''' and perform a base-2 to base-10 conversion so that MySQL loads the values into the note 'BIT': bit-type. column properly:

 $> cat /tmp/bit_test.txt
 b'10'
 b'1111111'
 $> mysql test
 mysql> LOAD DATA INFILE '/tmp/bit_test.txt'
        INTO TABLE bit_test (@var1)
        SET b = CAST(CONV(MID(@var1, 3, LENGTH(@var1)-3), 2, 10) AS UNSIGNED);
 Query OK, 2 rows affected (0.00 sec)
 Records: 2  Deleted: 0  Skipped: 0  Warnings: 0

 mysql> SELECT BIN(b+0) FROM bit_test;
 +----------+
 | BIN(b+0) |
 +----------+
 | 10       |
 | 1111111  |
 +----------+
 2 rows in set (0.00 sec)

For *note 'BIT': bit-type. values in '0b' binary notation (for example, '0b011010'), use this 'SET' clause instead to strip off the leading '0b':

 SET b = CAST(CONV(MID(@var1, 3, LENGTH(@var1)-2), 2, 10) AS UNSIGNED)

Partitioned Table Support

note 'LOAD DATA': load-data. supports explicit partition selection using the 'PARTITION' clause with a list of one or more comma-separated names of partitions, subpartitions, or both. When this clause is used, if any rows from the file cannot be inserted into any of the partitions or subpartitions named in the list, the statement fails with the error 'Found a row not matching the given partition set'. For more information and examples, see note partitioning-selection::.

For partitioned tables using storage engines that employ table locks, such as note 'MyISAM': myisam-storage-engine, note 'LOAD DATA': load-data. cannot prune any partition locks. This does not apply to tables using storage engines that employ row-level locking, such as note 'InnoDB': innodb-storage-engine. For more information, see note partitioning-limitations-locking::.

Concurrency Considerations

With the 'LOW_PRIORITY' modifier, execution of the *note 'LOAD DATA': load-data. statement is delayed until no other clients are reading from the table. This affects only storage engines that use only table-level locking (such as 'MyISAM', 'MEMORY', and 'MERGE').

With the 'CONCURRENT' modifier and a 'MyISAM' table that satisfies the condition for concurrent inserts (that is, it contains no free blocks in the middle), other threads can retrieve data from the table while note 'LOAD DATA': load-data. is executing. This modifier affects the performance of note 'LOAD DATA': load-data. a bit, even if no other thread is using the table at the same time.

Statement Result Information

When the *note 'LOAD DATA': load-data. statement finishes, it returns an information string in the following format:

 Records: 1  Deleted: 0  Skipped: 0  Warnings: 0

Warnings occur under the same circumstances as when values are inserted using the note 'INSERT': insert. statement (see note insert::), except that *note 'LOAD DATA': load-data. also generates warnings when there are too few or too many fields in the input row.

You can use note 'SHOW WARNINGS': show-warnings. to get a list of the first 'max_error_count' warnings as information about what went wrong. See note show-warnings::.

If you are using the C API, you can get information about the statement by calling the 'mysql_info()' (https://dev.mysql.com/doc/c-api/5.7/en/mysql-info.html) function. See mysql_info() (https://dev.mysql.com/doc/c-api/5.7/en/mysql-info.html).

Replication Considerations

For information about note 'LOAD DATA': load-data. in relation to replication, see note replication-features-load-data::.

Miscellaneous Topics

On Unix, if you need *note 'LOAD DATA': load-data. to read from a pipe, you can use the following technique (the example loads a listing of the '/' directory into the table 'db1.t1'):

 mkfifo /mysql/data/db1/ls.dat
 chmod 666 /mysql/data/db1/ls.dat
 find / -ls > /mysql/data/db1/ls.dat &
 mysql -e "LOAD DATA INFILE 'ls.dat' INTO TABLE t1" db1

Here you must run the command that generates the data to be loaded and the note 'mysql': mysql. commands either on separate terminals, or run the data generation process in the background (as shown in the preceding example). If you do not do this, the pipe blocks until data is read by the note 'mysql': mysql. process.

File: manual.info.tmp, Node: load-xml, Next: replace, Prev: load-data, Up: sql-data-manipulation-statements

13.2.7 LOAD XML Statement

 LOAD XML
     [LOW_PRIORITY | CONCURRENT] [LOCAL]
     INFILE 'FILE_NAME'
     [REPLACE | IGNORE]
     INTO TABLE [DB_NAME.]TBL_NAME
     [CHARACTER SET CHARSET_NAME]
     [ROWS IDENTIFIED BY '<TAGNAME>']
     [IGNORE NUMBER {LINES | ROWS}]
     [(FIELD_NAME_OR_USER_VAR
         [, FIELD_NAME_OR_USER_VAR] ...)]
     [SET COL_NAME={EXPR | DEFAULT}
         [, COL_NAME={EXPR | DEFAULT}] ...]

The *note 'LOAD XML': load-xml. statement reads data from an XML file into a table. The FILE_NAME must be given as a literal string. The TAGNAME in the optional 'ROWS IDENTIFIED BY' clause must also be given as a literal string, and must be surrounded by angle brackets ('<' and '>').

note 'LOAD XML': load-xml. acts as the complement of running the note 'mysql': mysql. client in XML output mode (that is, starting the client with the '--xml' option). To write data from a table to an XML file, you can invoke the *note 'mysql': mysql. client with the '--xml' and '-e' options from the system shell, as shown here:

 $> mysql --xml -e 'SELECT * FROM mydb.mytable' > file.xml

To read the file back into a table, use *note 'LOAD XML': load-xml. By default, the '' element is considered to be the equivalent of a database table row; this can be changed using the 'ROWS IDENTIFIED BY' clause.

This statement supports three different XML formats:

Column names as attributes and column values as attribute values:
```
  <ROW COLUMN1="VALUE1" COLUMN2="VALUE2" .../>
```

Column names as tags and column values as the content of these tags:

  <ROW>
    <COLUMN1>VALUE1</COLUMN1>
    <COLUMN2>VALUE2</COLUMN2>
  </ROW>

Column names are the 'name' attributes of '' tags, and values are the contents of these tags:
```
  <row>
    <field name='COLUMN1'>VALUE1</field>
    <field name='COLUMN2'>VALUE2</field>
  </row>
```
This is the format used by other MySQL tools, such as *note 'mysqldump': mysqldump.

All three formats can be used in the same XML file; the import routine automatically detects the format for each row and interprets it correctly. Tags are matched based on the tag or attribute name and the column name.

In MySQL 5.7, 'LOAD XML' does not support 'CDATA' sections in the source XML. This limitation is removed in MySQL 8.0. (Bug #30753708, Bug #98199)

The following clauses work essentially the same way for note 'LOAD XML': load-xml. as they do for note 'LOAD DATA': load-data.:

'LOW_PRIORITY' or 'CONCURRENT'
'LOCAL'
'REPLACE' or 'IGNORE'
'CHARACTER SET'
'SET'

See *note load-data::, for more information about these clauses.

'(FIELD_NAME_OR_USER_VAR, ...)' is a list of one or more comma-separated XML fields or user variables. The name of a user variable used for this purpose must match the name of a field from the XML file, prefixed with '@'. You can use field names to select only desired fields. User variables can be employed to store the corresponding field values for subsequent re-use.

The 'IGNORE NUMBER LINES' or 'IGNORE NUMBER ROWS' clause causes the first NUMBER rows in the XML file to be skipped. It is analogous to the *note 'LOAD DATA': load-data. statement's 'IGNORE ... LINES' clause.

Suppose that we have a table named 'person', created as shown here:

 USE test;

 CREATE TABLE person (
     person_id INT NOT NULL PRIMARY KEY,
     fname VARCHAR(40) NULL,
     lname VARCHAR(40) NULL,
     created TIMESTAMP
 );

Suppose further that this table is initially empty.

Now suppose that we have a simple XML file 'person.xml', whose contents are as shown here:

 <list>
   <person person_id="1" fname="Kapek" lname="Sainnouine"/>
   <person person_id="2" fname="Sajon" lname="Rondela"/>
   <person person_id="3"><fname>Likame</fname><lname>O"rrtmons</lname></person>
   <person person_id="4"><fname>Slar</fname><lname>Manlanth</lname></person>
   <person><field name="person_id">5</field><field name="fname">Stoma</field>
     <field name="lname">Milu</field></person>
   <person><field name="person_id">6</field><field name="fname">Nirtam</field>
     <field name="lname">Sklo"d</field></person>
   <person person_id="7"><fname>Sungam</fname><lname>Dulbaad</lname></person>
   <person person_id="8" fname="Sraref" lname="Encmelt"/>
 </list>

Each of the permissible XML formats discussed previously is represented in this example file.

To import the data in 'person.xml' into the 'person' table, you can use this statement:

 mysql> LOAD XML LOCAL INFILE 'person.xml'
     ->   INTO TABLE person
     ->   ROWS IDENTIFIED BY '<person>';

 Query OK, 8 rows affected (0.00 sec)
 Records: 8  Deleted: 0  Skipped: 0  Warnings: 0

Here, we assume that 'person.xml' is located in the MySQL data directory. If the file cannot be found, the following error results:

 ERROR 2 (HY000): File '/person.xml' not found (Errcode: 2)

The 'ROWS IDENTIFIED BY ''' clause means that each '' element in the XML file is considered equivalent to a row in the table into which the data is to be imported. In this case, this is the 'person' table in the 'test' database.

As can be seen by the response from the server, 8 rows were imported into the 'test.person' table. This can be verified by a simple *note 'SELECT': select. statement:

 mysql> SELECT * FROM person;
 +-----------+--------+------------+---------------------+
 | person_id | fname  | lname      | created             |
 +-----------+--------+------------+---------------------+
 |         1 | Kapek  | Sainnouine | 2007-07-13 16:18:47 |
 |         2 | Sajon  | Rondela    | 2007-07-13 16:18:47 |
 |         3 | Likame | O"rrtmons   | 2007-07-13 16:18:47 |
 |         4 | Slar   | Manlanth   | 2007-07-13 16:18:47 |
 |         5 | Stoma  | Nilu       | 2007-07-13 16:18:47 |
 |         6 | Nirtam | Sklo"d      | 2007-07-13 16:18:47 |
 |         7 | Sungam | Dulbaad     | 2007-07-13 16:18:47 |
 |         8 | Sreraf | Encmelt    | 2007-07-13 16:18:47 |
 +-----------+--------+------------+---------------------+
 8 rows in set (0.00 sec)

This shows, as stated earlier in this section, that any or all of the 3 permitted XML formats may appear in a single file and be read using *note 'LOAD XML': load-xml.

The inverse of the import operation just shown--that is, dumping MySQL table data into an XML file--can be accomplished using the *note 'mysql': mysql. client from the system shell, as shown here:

 $> mysql --xml -e "SELECT * FROM test.person" > person-dump.xml
 $> cat person-dump.xml
 <?xml version="1.0"?>

 <resultset statement="SELECT * FROM test.person" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
   <row>
    <field name="person_id">1</field>
    <field name="fname">Kapek</field>
    <field name="lname">Sainnouine</field>
   </row>

   <row>
    <field name="person_id">2</field>
    <field name="fname">Sajon</field>
    <field name="lname">Rondela</field>
   </row>

   <row>
    <field name="person_id">3</field>
    <field name="fname">Likema</field>
    <field name="lname">O"rrtmons</field>
   </row>

   <row>
    <field name="person_id">4</field>
    <field name="fname">Slar</field>
    <field name="lname">Manlanth</field>
   </row>

   <row>
    <field name="person_id">5</field>
    <field name="fname">Stoma</field>
    <field name="lname">Nilu</field>
   </row>

   <row>
    <field name="person_id">6</field>
    <field name="fname">Nirtam</field>
    <field name="lname">Sklo"d</field>
   </row>

   <row>
    <field name="person_id">7</field>
    <field name="fname">Sungam</field>
    <field name="lname">Dulbaad</field>
   </row>

   <row>
    <field name="person_id">8</field>
    <field name="fname">Sreraf</field>
    <field name="lname">Encmelt</field>
   </row>
 </resultset>

Note:

The '--xml' option causes the note 'mysql': mysql. client to use XML formatting for its output; the '-e' option causes the client to execute the SQL statement immediately following the option. See note mysql::.

You can verify that the dump is valid by creating a copy of the 'person' table and importing the dump file into the new table, like this:

 mysql> USE test;
 mysql> CREATE TABLE person2 LIKE person;
 Query OK, 0 rows affected (0.00 sec)

 mysql> LOAD XML LOCAL INFILE 'person-dump.xml'
     ->   INTO TABLE person2;
 Query OK, 8 rows affected (0.01 sec)
 Records: 8  Deleted: 0  Skipped: 0  Warnings: 0

 mysql> SELECT * FROM person2;
 +-----------+--------+------------+---------------------+
 | person_id | fname  | lname      | created             |
 +-----------+--------+------------+---------------------+
 |         1 | Kapek  | Sainnouine | 2007-07-13 16:18:47 |
 |         2 | Sajon  | Rondela    | 2007-07-13 16:18:47 |
 |         3 | Likema | O"rrtmons   | 2007-07-13 16:18:47 |
 |         4 | Slar   | Manlanth   | 2007-07-13 16:18:47 |
 |         5 | Stoma  | Nilu       | 2007-07-13 16:18:47 |
 |         6 | Nirtam | Sklo"d      | 2007-07-13 16:18:47 |
 |         7 | Sungam | Dulbaad     | 2007-07-13 16:18:47 |
 |         8 | Sreraf | Encmelt    | 2007-07-13 16:18:47 |
 +-----------+--------+------------+---------------------+
 8 rows in set (0.00 sec)

There is no requirement that every field in the XML file be matched with a column in the corresponding table. Fields which have no corresponding columns are skipped. You can see this by first emptying the 'person2' table and dropping the 'created' column, then using the same *note 'LOAD XML': load-xml. statement we just employed previously, like this:

 mysql> TRUNCATE person2;
 Query OK, 8 rows affected (0.26 sec)

 mysql> ALTER TABLE person2 DROP COLUMN created;
 Query OK, 0 rows affected (0.52 sec)
 Records: 0  Duplicates: 0  Warnings: 0

 mysql> SHOW CREATE TABLE person2\G
 *************************** 1. row ***************************
        Table: person2
 Create Table: CREATE TABLE `person2` (
   `person_id` int(11) NOT NULL,
   `fname` varchar(40) DEFAULT NULL,
   `lname` varchar(40) DEFAULT NULL,
   PRIMARY KEY (`person_id`)
 ) ENGINE=InnoDB DEFAULT CHARSET=utf8
 1 row in set (0.00 sec)

 mysql> LOAD XML LOCAL INFILE 'person-dump.xml'
     ->   INTO TABLE person2;
 Query OK, 8 rows affected (0.01 sec)
 Records: 8  Deleted: 0  Skipped: 0  Warnings: 0

 mysql> SELECT * FROM person2;
 +-----------+--------+------------+
 | person_id | fname  | lname      |
 +-----------+--------+------------+
 |         1 | Kapek  | Sainnouine |
 |         2 | Sajon  | Rondela    |
 |         3 | Likema | O"rrtmons   |
 |         4 | Slar   | Manlanth   |
 |         5 | Stoma  | Nilu       |
 |         6 | Nirtam | Sklo"d      |
 |         7 | Sungam | Dulbaad     |
 |         8 | Sreraf | Encmelt    |
 +-----------+--------+------------+
 8 rows in set (0.00 sec)

The order in which the fields are given within each row of the XML file does not affect the operation of *note 'LOAD XML': load-xml.; the field order can vary from row to row, and is not required to be in the same order as the corresponding columns in the table.

As mentioned previously, you can use a '(FIELD_NAME_OR_USER_VAR, ...)' list of one or more XML fields (to select desired fields only) or user variables (to store the corresponding field values for later use). User variables can be especially useful when you want to insert data from an XML file into table columns whose names do not match those of the XML fields. To see how this works, we first create a table named 'individual' whose structure matches that of the 'person' table, but whose columns are named differently:

 mysql> CREATE TABLE individual (
     ->     individual_id INT NOT NULL PRIMARY KEY,
     ->     name1 VARCHAR(40) NULL,
     ->     name2 VARCHAR(40) NULL,
     ->     made TIMESTAMP
     -> );
 Query OK, 0 rows affected (0.42 sec)

In this case, you cannot simply load the XML file directly into the table, because the field and column names do not match:

 mysql> LOAD XML INFILE '../bin/person-dump.xml' INTO TABLE test.individual;
 ERROR 1263 (22004): Column set to default value; NULL supplied to NOT NULL column 'individual_id' at row 1

This happens because the MySQL server looks for field names matching the column names of the target table. You can work around this problem by selecting the field values into user variables, then setting the target table's columns equal to the values of those variables using 'SET'. You can perform both of these operations in a single statement, as shown here:

 mysql> LOAD XML INFILE '../bin/person-dump.xml'
     ->     INTO TABLE test.individual (@person_id, @fname, @lname, @created)
     ->     SET individual_id=@person_id, name1=@fname, name2=@lname, made=@created;
 Query OK, 8 rows affected (0.05 sec)
 Records: 8  Deleted: 0  Skipped: 0  Warnings: 0

 mysql> SELECT * FROM individual;
 +---------------+--------+------------+---------------------+
 | individual_id | name1  | name2      | made                |
 +---------------+--------+------------+---------------------+
 |             1 | Kapek  | Sainnouine | 2007-07-13 16:18:47 |
 |             2 | Sajon  | Rondela    | 2007-07-13 16:18:47 |
 |             3 | Likema | O"rrtmons   | 2007-07-13 16:18:47 |
 |             4 | Slar   | Manlanth   | 2007-07-13 16:18:47 |
 |             5 | Stoma  | Nilu       | 2007-07-13 16:18:47 |
 |             6 | Nirtam | Sklo"d      | 2007-07-13 16:18:47 |
 |             7 | Sungam | Dulbaad     | 2007-07-13 16:18:47 |
 |             8 | Srraf  | Encmelt    | 2007-07-13 16:18:47 |
 +---------------+--------+------------+---------------------+
 8 rows in set (0.00 sec)

The names of the user variables must match those of the corresponding fields from the XML file, with the addition of the required '@' prefix to indicate that they are variables. The user variables need not be listed or assigned in the same order as the corresponding fields.

Using a 'ROWS IDENTIFIED BY ''' clause, it is possible to import data from the same XML file into database tables with different definitions. For this example, suppose that you have a file named 'address.xml' which contains the following XML:

 <?xml version="1.0"?>

 <list>
   <person person_id="1">
     <fname>Robert</fname>
     <lname>Jones</lname>
     <address address_id="1" street="Mill Creek Road" zip="45365" city="Sidney"/>
     <address address_id="2" street="Main Street" zip="28681" city="Taylorsville"/>
   </person>

   <person person_id="2">
     <fname>Mary</fname>
     <lname>Smith</lname>
     <address address_id="3" street="River Road" zip="80239" city="Denver"/>
     <!-- <address address_id="4" street="North Street" zip="37920" city="Knoxville"/> -->
   </person>

 </list>

You can again use the 'test.person' table as defined previously in this section, after clearing all the existing records from the table and then showing its structure as shown here:

 mysql< TRUNCATE person;
 Query OK, 0 rows affected (0.04 sec)

 mysql< SHOW CREATE TABLE person\G
 *************************** 1. row ***************************
        Table: person
 Create Table: CREATE TABLE `person` (
   `person_id` int(11) NOT NULL,
   `fname` varchar(40) DEFAULT NULL,
   `lname` varchar(40) DEFAULT NULL,
   `created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
   PRIMARY KEY (`person_id`)
 ) ENGINE=MyISAM DEFAULT CHARSET=latin1
 1 row in set (0.00 sec)

Now create an 'address' table in the 'test' database using the following *note 'CREATE TABLE': create-table. statement:

 CREATE TABLE address (
     address_id INT NOT NULL PRIMARY KEY,
     person_id INT NULL,
     street VARCHAR(40) NULL,
     zip INT NULL,
     city VARCHAR(40) NULL,
     created TIMESTAMP
 );

To import the data from the XML file into the 'person' table, execute the following *note 'LOAD XML': load-xml. statement, which specifies that rows are to be specified by the '' element, as shown here;

 mysql> LOAD XML LOCAL INFILE 'address.xml'
     ->   INTO TABLE person
     ->   ROWS IDENTIFIED BY '<person>';
 Query OK, 2 rows affected (0.00 sec)
 Records: 2  Deleted: 0  Skipped: 0  Warnings: 0

You can verify that the records were imported using a *note 'SELECT': select. statement:

 mysql> SELECT * FROM person;
 +-----------+--------+-------+---------------------+
 | person_id | fname  | lname | created             |
 +-----------+--------+-------+---------------------+
 |         1 | Robert | Jones | 2007-07-24 17:37:06 |
 |         2 | Mary   | Smith | 2007-07-24 17:37:06 |
 +-----------+--------+-------+---------------------+
 2 rows in set (0.00 sec)

Since the '

' elements in the XML file have no corresponding columns in the 'person' table, they are skipped.

To import the data from the '

' elements into the 'address' table, use the *note 'LOAD XML': load-xml. statement shown here:

 mysql> LOAD XML LOCAL INFILE 'address.xml'
     ->   INTO TABLE address
     ->   ROWS IDENTIFIED BY '<address>';
 Query OK, 3 rows affected (0.00 sec)
 Records: 3  Deleted: 0  Skipped: 0  Warnings: 0

You can see that the data was imported using a *note 'SELECT': select. statement such as this one:

 mysql> SELECT * FROM address;
 +------------+-----------+-----------------+-------+--------------+---------------------+
 | address_id | person_id | street          | zip   | city         | created             |
 +------------+-----------+-----------------+-------+--------------+---------------------+
 |          1 |         1 | Mill Creek Road | 45365 | Sidney       | 2007-07-24 17:37:37 |
 |          2 |         1 | Main Street     | 28681 | Taylorsville | 2007-07-24 17:37:37 |
 |          3 |         2 | River Road      | 80239 | Denver       | 2007-07-24 17:37:37 |
 +------------+-----------+-----------------+-------+--------------+---------------------+
 3 rows in set (0.00 sec)

The data from the '

' element that is enclosed in XML comments is not imported. However, since there is a 'person_id' column in the 'address' table, the value of the 'person_id' attribute from the parent '' element for each '

' is imported into the 'address' table.

Security Considerations

As with the note 'LOAD DATA': load-data. statement, the transfer of the XML file from the client host to the server host is initiated by the MySQL server. In theory, a patched server could be built that would tell the client program to transfer a file of the server's choosing rather than the file named by the client in the note 'LOAD XML': load-xml. statement. Such a server could access any file on the client host to which the client user has read access.

In a Web environment, clients usually connect to MySQL from a Web server. A user that can run any command against the MySQL server can use *note 'LOAD XML LOCAL': load-xml. to read any files to which the Web server process has read access. In this environment, the client with respect to the MySQL server is actually the Web server, not the remote program being run by the user who connects to the Web server.

You can disable loading of XML files from clients by starting the server with '--local-infile=0' or '--local-infile=OFF'. This option can also be used when starting the note 'mysql': mysql. client to disable note 'LOAD XML': load-xml. for the duration of the client session.

To prevent a client from loading XML files from the server, do not grant the 'FILE' privilege to the corresponding MySQL user account, or revoke this privilege if the client user account already has it.

Important:

Revoking the 'FILE' privilege (or not granting it in the first place) keeps the user only from executing the note 'LOAD XML': load-xml. statement (as well as the 'LOAD_FILE()' function; it does not prevent the user from executing note 'LOAD XML LOCAL': load-xml. To disallow this statement, you must start the server or the client with '--local-infile=OFF'.

In other words, the 'FILE' privilege affects only whether the client can read files on the server; it has no bearing on whether the client can read files on the local file system.

For partitioned tables using storage engines that employ table locks, such as note 'MyISAM': myisam-storage-engine, any locks caused by note 'LOAD XML': load-xml. perform locks on all partitions of the table. This does not apply to tables using storage engines which employ row-level locking, such as note 'InnoDB': innodb-storage-engine. For more information, see note partitioning-limitations-locking::.

File: manual.info.tmp, Node: replace, Next: select, Prev: load-xml, Up: sql-data-manipulation-statements

13.2.8 REPLACE Statement

 REPLACE [LOW_PRIORITY | DELAYED]
     [INTO] TBL_NAME
     [PARTITION (PARTITION_NAME [, PARTITION_NAME] ...)]
     [(COL_NAME [, COL_NAME] ...)]
     {VALUES | VALUE} (VALUE_LIST) [, (VALUE_LIST)] ...

 REPLACE [LOW_PRIORITY | DELAYED]
     [INTO] TBL_NAME
     [PARTITION (PARTITION_NAME [, PARTITION_NAME] ...)]
     SET ASSIGNMENT_LIST

 REPLACE [LOW_PRIORITY | DELAYED]
     [INTO] TBL_NAME
     [PARTITION (PARTITION_NAME [, PARTITION_NAME] ...)]
     [(COL_NAME [, COL_NAME] ...)]
     SELECT ...

 VALUE:
     {EXPR | DEFAULT}

 VALUE_LIST:
     VALUE [, VALUE] ...

 ASSIGNMENT:
     COL_NAME = VALUE

 ASSIGNMENT_LIST:
     ASSIGNMENT [, ASSIGNMENT] ...

note 'REPLACE': replace. works exactly like note 'INSERT': insert, except that if an old row in the table has the same value as a new row for a 'PRIMARY KEY' or a 'UNIQUE' index, the old row is deleted before the new row is inserted. See *note insert::.

note 'REPLACE': replace. is a MySQL extension to the SQL standard. It either inserts, or deletes and inserts. For another MySQL extension to standard SQL--that either inserts or updates--see note insert-on-duplicate::.

'DELAYED' inserts and replaces were deprecated in MySQL 5.6. In MySQL 5.7, 'DELAYED' is not supported. The server recognizes but ignores the 'DELAYED' keyword, handles the replace as a nondelayed replace, and generates an 'ER_WARN_LEGACY_SYNTAX_CONVERTED' warning: 'REPLACE DELAYED is no longer supported. The statement was converted to REPLACE'. The 'DELAYED' keyword is scheduled for removal in a future release. release.

Note:

note 'REPLACE': replace. makes sense only if a table has a 'PRIMARY KEY' or 'UNIQUE' index. Otherwise, it becomes equivalent to note 'INSERT': insert, because there is no index to be used to determine whether a new row duplicates another.

Values for all columns are taken from the values specified in the note 'REPLACE': replace. statement. Any missing columns are set to their default values, just as happens for note 'INSERT': insert. You cannot refer to values from the current row and use them in the new row. If you use an assignment such as 'SET COL_NAME = COL_NAME + 1', the reference to the column name on the right hand side is treated as 'DEFAULT(COL_NAME)', so the assignment is equivalent to 'SET COL_NAME = DEFAULT(COL_NAME) + 1'.

To use *note 'REPLACE': replace, you must have both the 'INSERT' and 'DELETE' privileges for the table.

If a generated column is replaced explicitly, the only permitted value is 'DEFAULT'. For information about generated columns, see *note create-table-generated-columns::.

'REPLACE' supports explicit partition selection using the 'PARTITION' clause with a list of comma-separated names of partitions, subpartitions, or both. As with note 'INSERT': insert, if it is not possible to insert the new row into any of these partitions or subpartitions, the 'REPLACE' statement fails with the error 'Found a row not matching the given partition set'. For more information and examples, see note partitioning-selection::.

The note 'REPLACE': replace. statement returns a count to indicate the number of rows affected. This is the sum of the rows deleted and inserted. If the count is 1 for a single-row note 'REPLACE': replace, a row was inserted and no rows were deleted. If the count is greater than 1, one or more old rows were deleted before the new row was inserted. It is possible for a single row to replace more than one old row if the table contains multiple unique indexes and the new row duplicates values for different old rows in different unique indexes.

The affected-rows count makes it easy to determine whether *note 'REPLACE': replace. only added a row or whether it also replaced any rows: Check whether the count is 1 (added) or greater (replaced).

If you are using the C API, the affected-rows count can be obtained using the 'mysql_affected_rows()' (https://dev.mysql.com/doc/c-api/5.7/en/mysql-affected-rows.html) function.

You cannot replace into a table and select from the same table in a subquery.

MySQL uses the following algorithm for note 'REPLACE': replace. (and note 'LOAD DATA ... REPLACE': load-data.):

Try to insert the new row into the table
While the insertion fails because a duplicate-key error occurs for a primary key or unique index:
1. Delete from the table the conflicting row that has the duplicate key value
2. Try again to insert the new row into the table

It is possible that in the case of a duplicate-key error, a storage engine may perform the 'REPLACE' as an update rather than a delete plus insert, but the semantics are the same. There are no user-visible effects other than a possible difference in how the storage engine increments 'Handler_XXX' status variables.

Because the results of 'REPLACE ... SELECT' statements depend on the ordering of rows from the note 'SELECT': select. and this order cannot always be guaranteed, it is possible when logging these statements for the source and the replica to diverge. For this reason, 'REPLACE ... SELECT' statements are flagged as unsafe for statement-based replication. such statements produce a warning in the error log when using statement-based mode and are written to the binary log using the row-based format when using 'MIXED' mode. See also note replication-sbr-rbr::.

When modifying an existing table that is not partitioned to accommodate partitioning, or, when modifying the partitioning of an already partitioned table, you may consider altering the table's primary key (see note partitioning-limitations-partitioning-keys-unique-keys::). You should be aware that, if you do this, the results of 'REPLACE' statements may be affected, just as they would be if you modified the primary key of a nonpartitioned table. Consider the table created by the following note 'CREATE TABLE': create-table. statement:

 CREATE TABLE test (
   id INT UNSIGNED NOT NULL AUTO_INCREMENT,
   data VARCHAR(64) DEFAULT NULL,
   ts TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
   PRIMARY KEY (id)
 );

When we create this table and run the statements shown in the mysql client, the result is as follows:

 mysql> REPLACE INTO test VALUES (1, 'Old', '2014-08-20 18:47:00');
 Query OK, 1 row affected (0.04 sec)

 mysql> REPLACE INTO test VALUES (1, 'New', '2014-08-20 18:47:42');
 Query OK, 2 rows affected (0.04 sec)

 mysql> SELECT * FROM test;
 +----+------+---------------------+
 | id | data | ts                  |
 +----+------+---------------------+
 |  1 | New  | 2014-08-20 18:47:42 |
 +----+------+---------------------+
 1 row in set (0.00 sec)

Now we create a second table almost identical to the first, except that the primary key now covers 2 columns, as shown here (emphasized text):

 CREATE TABLE test2 (
   id INT UNSIGNED NOT NULL AUTO_INCREMENT,
   data VARCHAR(64) DEFAULT NULL,
   ts TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
   _PRIMARY KEY (id, ts)_
 );

When we run on 'test2' the same two 'REPLACE' statements as we did on the original 'test' table, we obtain a different result:

 mysql> REPLACE INTO test2 VALUES (1, 'Old', '2014-08-20 18:47:00');
 Query OK, 1 row affected (0.05 sec)

 mysql> REPLACE INTO test2 VALUES (1, 'New', '2014-08-20 18:47:42');
 Query OK, 1 row affected (0.06 sec)

 mysql> SELECT * FROM test2;
 +----+------+---------------------+
 | id | data | ts                  |
 +----+------+---------------------+
 |  1 | Old  | 2014-08-20 18:47:00 |
 |  1 | New  | 2014-08-20 18:47:42 |
 +----+------+---------------------+
 2 rows in set (0.00 sec)

This is due to the fact that, when run on 'test2', both the 'id' and 'ts' column values must match those of an existing row for the row to be replaced; otherwise, a row is inserted.

A 'REPLACE' statement affecting a partitioned table using a storage engine such as note 'MyISAM': myisam-storage-engine. that employs table-level locks locks only those partitions containing rows that match the 'REPLACE' statement 'WHERE' clause, as long as none of the table partitioning columns are updated; otherwise the entire table is locked. (For storage engines such as note 'InnoDB': innodb-storage-engine. that employ row-level locking, no locking of partitions takes place.) For more information, see *note partitioning-limitations-locking::.

File: manual.info.tmp, Node: select, Next: subqueries, Prev: replace, Up: sql-data-manipulation-statements

13.2.9 SELECT Statement

Menu:
select-into:: SELECT ... INTO Statement
join:: JOIN Clause
union:: UNION Clause

SELECT [ALL | DISTINCT | DISTINCTROW ][HIGH_PRIORITY] [STRAIGHT_JOIN][SQL_SMALL_RESULT] [SQL_BIG_RESULT] [SQL_BUFFER_RESULT] [SQL_CACHE | SQL_NO_CACHE] [SQL_CALC_FOUND_ROWS] SELECT_EXPR [, SELECT_EXPR] ... [INTO_OPTION][FROM TABLE_REFERENCES [PARTITION PARTITION_LIST]] [WHERE WHERE_CONDITION][GROUP BY {COL_NAME | EXPR | POSITION} [ASC | DESC], ... [WITH ROLLUP]] [HAVING WHERE_CONDITION][ORDER BY {COL_NAME | EXPR | POSITION} [ASC | DESC], ...] [LIMIT {[OFFSET,] ROW_COUNT | ROW_COUNT OFFSET OFFSET}][PROCEDURE PROCEDURE_NAME(ARGUMENT_LIST)] [INTO_OPTION][FOR UPDATE | LOCK IN SHARE MODE]

INTO_OPTION: { INTO OUTFILE 'FILE_NAME' [CHARACTER SET CHARSET_NAME] EXPORT_OPTIONS | INTO DUMPFILE 'FILE_NAME' | INTO VAR_NAME [, VAR_NAME] ... }

EXPORT_OPTIONS: [{FIELDS | COLUMNS} [TERMINATED BY 'string'][[OPTIONALLY] ENCLOSED BY 'char'] [ESCAPED BY 'char'] ][LINES [STARTING BY 'string'][TERMINATED BY 'string'] ]

note 'SELECT': select. is used to retrieve rows selected from one or more tables, and can include note 'UNION': union. statements and subqueries. See note union::, and note subqueries::.

The most commonly used clauses of *note 'SELECT': select. statements are these:

Each SELECT_EXPR indicates a column that you want to retrieve. There must be at least one SELECT_EXPR.
TABLE_REFERENCES indicates the table or tables from which to retrieve rows. Its syntax is described in *note join::.
'SELECT' supports explicit partition selection using the 'PARTITION' clause with a list of partitions or subpartitions (or both) following the name of the table in a TABLE_REFERENCE (see note join::). In this case, rows are selected only from the partitions listed, and any other partitions of the table are ignored. For more information and examples, see note partitioning-selection::.

'SELECT ... PARTITION' from tables using storage engines such as *note 'MyISAM': myisam-storage-engine. that perform table-level locks (and thus partition locks) lock only the partitions or subpartitions named by the 'PARTITION' option.

For more information, see *note partitioning-limitations-locking::.
The 'WHERE' clause, if given, indicates the condition or conditions that rows must satisfy to be selected. WHERE_CONDITION is an expression that evaluates to true for each row to be selected. The statement selects all rows if there is no 'WHERE' clause.

In the 'WHERE' expression, you can use any of the functions and operators that MySQL supports, except for aggregate (group) functions. See note expressions::, and note functions::.

*note 'SELECT': select. can also be used to retrieve rows computed without reference to any table.

For example:

 mysql> SELECT 1 + 1;
         -> 2

You are permitted to specify 'DUAL' as a dummy table name in situations where no tables are referenced:

 mysql> SELECT 1 + 1 FROM DUAL;
         -> 2

'DUAL' is purely for the convenience of people who require that all *note 'SELECT': select. statements should have 'FROM' and possibly other clauses. MySQL may ignore the clauses. MySQL does not require 'FROM DUAL' if no tables are referenced.

In general, clauses used must be given in exactly the order shown in the syntax description. For example, a 'HAVING' clause must come after any 'GROUP BY' clause and before any 'ORDER BY' clause. The 'INTO' clause, if present, can appear in any position indicated by the syntax description, but within a given statement can appear only once, not in multiple positions. For more information about 'INTO', see *note select-into::.

The list of SELECT_EXPR terms comprises the select list that indicates which columns to retrieve. Terms specify a column or expression or can use '*'-shorthand:

A select list consisting only of a single unqualified '*' can be used as shorthand to select all columns from all tables:
```
  SELECT * FROM t1 INNER JOIN t2 ...
```
'TBL_NAME.*' can be used as a qualified shorthand to select all columns from the named table:
```
  SELECT t1.*, t2.* FROM t1 INNER JOIN t2 ...
```
Use of an unqualified '*' with other items in the select list may produce a parse error. For example:
```
  SELECT id, * FROM t1
```
To avoid this problem, use a qualified 'TBL_NAME.*' reference:
```
  SELECT id, t1.* FROM t1
```
Use qualified 'TBL_NAME.*' references for each table in the select list:
```
  SELECT AVG(score), t1.* FROM t1 ...
```

The following list provides additional information about other 'SELECT' clauses:

A SELECT_EXPR can be given an alias using 'AS ALIAS_NAME'. The alias is used as the expression's column name and can be used in 'GROUP BY', 'ORDER BY', or 'HAVING' clauses. For example:
```
  SELECT CONCAT(last_name,', ',first_name) AS full_name
    FROM mytable ORDER BY full_name;
```
The 'AS' keyword is optional when aliasing a SELECT_EXPR with an identifier. The preceding example could have been written like this:
```
  SELECT CONCAT(last_name,', ',first_name) full_name
    FROM mytable ORDER BY full_name;
```
However, because the 'AS' is optional, a subtle problem can occur if you forget the comma between two SELECT_EXPR expressions: MySQL interprets the second as an alias name. For example, in the following statement, 'columnb' is treated as an alias name:
```
  SELECT columna columnb FROM mytable;
```
For this reason, it is good practice to be in the habit of using 'AS' explicitly when specifying column aliases.

It is not permissible to refer to a column alias in a 'WHERE' clause, because the column value might not yet be determined when the 'WHERE' clause is executed. See *note problems-with-alias::.
The 'FROM TABLE_REFERENCES' clause indicates the table or tables from which to retrieve rows. If you name more than one table, you are performing a join. For information on join syntax, see *note join::. For each table specified, you can optionally specify an alias.
```
  TBL_NAME [[AS] ALIAS] [INDEX_HINT]
```
The use of index hints provides the optimizer with information about how to choose indexes during query processing. For a description of the syntax for specifying these hints, see *note index-hints::.

You can use 'SET max_seeks_for_key=VALUE' as an alternative way to force MySQL to prefer key scans instead of table scans. See *note server-system-variables::.
You can refer to a table within the default database as TBL_NAME, or as DB_NAME.TBL_NAME to specify a database explicitly. You can refer to a column as COL_NAME, TBL_NAME.COL_NAME, or DB_NAME.TBL_NAME.COL_NAME. You need not specify a TBL_NAME or DB_NAME.TBL_NAME prefix for a column reference unless the reference would be ambiguous. See *note identifier-qualifiers::, for examples of ambiguity that require the more explicit column reference forms.

A table reference can be aliased using 'TBL_NAME AS ALIAS_NAME' or TBL_NAME ALIAS_NAME. These statements are equivalent:

  SELECT t1.name, t2.salary FROM employee AS t1, info AS t2
    WHERE t1.name = t2.name;

  SELECT t1.name, t2.salary FROM employee t1, info t2
    WHERE t1.name = t2.name;

Columns selected for output can be referred to in 'ORDER BY' and 'GROUP BY' clauses using column names, column aliases, or column positions. Column positions are integers and begin with 1:
```
  SELECT college, region, seed FROM tournament
    ORDER BY region, seed;

  SELECT college, region AS r, seed AS s FROM tournament
    ORDER BY r, s;

  SELECT college, region, seed FROM tournament
    ORDER BY 2, 3;
```
To sort in reverse order, add the 'DESC' (descending) keyword to the name of the column in the 'ORDER BY' clause that you are sorting by. The default is ascending order; this can be specified explicitly using the 'ASC' keyword.

If 'ORDER BY' occurs within a parenthesized query expression and also is applied in the outer query, the results are undefined and may change in a future MySQL version.

Use of column positions is deprecated because the syntax has been removed from the SQL standard.
MySQL extends the 'GROUP BY' clause so that you can also specify 'ASC' and 'DESC' after columns named in the clause. However, this syntax is deprecated. To produce a given sort order, provide an 'ORDER BY' clause.
If you use 'GROUP BY', output rows are sorted according to the 'GROUP BY' columns as if you had an 'ORDER BY' for the same columns. To avoid the overhead of sorting that 'GROUP BY' produces, add 'ORDER BY NULL':
```
  SELECT a, COUNT(b) FROM test_table GROUP BY a ORDER BY NULL;
```
Relying on implicit 'GROUP BY' sorting (that is, sorting in the absence of 'ASC' or 'DESC' designators) or explicit sorting for 'GROUP BY' (that is, by using explicit 'ASC' or 'DESC' designators for 'GROUP BY' columns) is deprecated. To produce a given sort order, provide an 'ORDER BY' clause.
When you use 'ORDER BY' or 'GROUP BY' to sort a column in a *note 'SELECT': select, the server sorts values using only the initial number of bytes indicated by the 'max_sort_length' system variable.
MySQL extends the use of 'GROUP BY' to permit selecting fields that are not mentioned in the 'GROUP BY' clause. If you are not getting the results that you expect from your query, please read the description of 'GROUP BY' found in *note aggregate-functions-and-modifiers::.
'GROUP BY' permits a 'WITH ROLLUP' modifier. See *note group-by-modifiers::.
The 'HAVING' clause, like the 'WHERE' clause, specifies selection conditions. The 'WHERE' clause specifies conditions on columns in the select list, but cannot refer to aggregate functions. The 'HAVING' clause specifies conditions on groups, typically formed by the 'GROUP BY' clause. The query result includes only groups satisfying the 'HAVING' conditions. (If no 'GROUP BY' is present, all rows implicitly form a single aggregate group.)

The 'HAVING' clause is applied nearly last, just before items are sent to the client, with no optimization. ('LIMIT' is applied after 'HAVING'.)

The SQL standard requires that 'HAVING' must reference only columns in the 'GROUP BY' clause or columns used in aggregate functions. However, MySQL supports an extension to this behavior, and permits 'HAVING' to refer to columns in the *note 'SELECT': select. list and columns in outer subqueries as well.

If the 'HAVING' clause refers to a column that is ambiguous, a warning occurs. In the following statement, 'col2' is ambiguous because it is used as both an alias and a column name:
```
  SELECT COUNT(col1) AS col2 FROM t GROUP BY col2 HAVING col2 = 2;
```
Preference is given to standard SQL behavior, so if a 'HAVING' column name is used both in 'GROUP BY' and as an aliased column in the select column list, preference is given to the column in the 'GROUP BY' column.
Do not use 'HAVING' for items that should be in the 'WHERE' clause. For example, do not write the following:
```
  SELECT COL_NAME FROM TBL_NAME HAVING COL_NAME > 0;
```
Write this instead:
```
  SELECT COL_NAME FROM TBL_NAME WHERE COL_NAME > 0;
```
The 'HAVING' clause can refer to aggregate functions, which the 'WHERE' clause cannot:
```
  SELECT user, MAX(salary) FROM users
    GROUP BY user HAVING MAX(salary) > 10;
```
(This did not work in some older versions of MySQL.)
MySQL permits duplicate column names. That is, there can be more than one SELECT_EXPR with the same name. This is an extension to standard SQL. Because MySQL also permits 'GROUP BY' and 'HAVING' to refer to SELECT_EXPR values, this can result in an ambiguity:
```
  SELECT 12 AS a, a FROM t GROUP BY a;
```
In that statement, both columns have the name 'a'. To ensure that the correct column is used for grouping, use different names for each SELECT_EXPR.
MySQL resolves unqualified column or alias references in 'ORDER BY' clauses by searching in the SELECT_EXPR values, then in the columns of the tables in the 'FROM' clause. For 'GROUP BY' or 'HAVING' clauses, it searches the 'FROM' clause before searching in the SELECT_EXPR values. (For 'GROUP BY' and 'HAVING', this differs from the pre-MySQL 5.0 behavior that used the same rules as for 'ORDER BY'.)
The 'LIMIT' clause can be used to constrain the number of rows returned by the *note 'SELECT': select. statement. 'LIMIT' takes one or two numeric arguments, which must both be nonnegative integer constants, with these exceptions:
```
* Within prepared statements, 'LIMIT' parameters can be
  specified using '?' placeholder markers.

* Within stored programs, 'LIMIT' parameters can be specified
  using integer-valued routine parameters or local variables.
```
With two arguments, the first argument specifies the offset of the first row to return, and the second specifies the maximum number of rows to return. The offset of the initial row is 0 (not 1):
```
  SELECT * FROM tbl LIMIT 5,10;  # Retrieve rows 6-15
```
To retrieve all rows from a certain offset up to the end of the result set, you can use some large number for the second parameter. This statement retrieves all rows from the 96th row to the last:
```
  SELECT * FROM tbl LIMIT 95,18446744073709551615;
```
With one argument, the value specifies the number of rows to return from the beginning of the result set:
```
  SELECT * FROM tbl LIMIT 5;     # Retrieve first 5 rows
```
In other words, 'LIMIT ROW_COUNT' is equivalent to 'LIMIT 0, ROW_COUNT'.

For prepared statements, you can use placeholders. The following statements return one row from the 'tbl' table:
```
  SET @a=1;
  PREPARE STMT FROM 'SELECT * FROM tbl LIMIT ?';
  EXECUTE STMT USING @a;
```
The following statements return the second to sixth row from the 'tbl' table:
```
  SET @skip=1; SET @numrows=5;
  PREPARE STMT FROM 'SELECT * FROM tbl LIMIT ?, ?';
  EXECUTE STMT USING @skip, @numrows;
```
For compatibility with PostgreSQL, MySQL also supports the 'LIMIT ROW_COUNT OFFSET OFFSET' syntax.

If 'LIMIT' occurs within a parenthesized query expression and also is applied in the outer query, the results are undefined and may change in a future MySQL version.
A 'PROCEDURE' clause names a procedure that should process the data in the result set. For an example, see *note procedure-analyse::, which describes 'ANALYSE', a procedure that can be used to obtain suggestions for optimal column data types that may help reduce table sizes.

A 'PROCEDURE' clause is not permitted in a *note 'UNION': union. statement.

Note:

'PROCEDURE' syntax is deprecated as of MySQL 5.7.18, and is removed in MySQL 8.0.
The note 'SELECT ... INTO': select-into. form of note 'SELECT': select. enables the query result to be written to a file or stored in variables. For more information, see *note select-into::.
If you use 'FOR UPDATE' with a storage engine that uses page or row locks, rows examined by the query are write-locked until the end of the current transaction. Using 'LOCK IN SHARE MODE' sets a shared lock that permits other transactions to read the examined rows but not to update or delete them. See *note innodb-locking-reads::.

In addition, you cannot use 'FOR UPDATE' as part of the note 'SELECT': select. in a statement such as note 'CREATE TABLE NEW_TABLE SELECT ... FROM OLD_TABLE ...': create-table-select. (If you attempt to do so, the statement is rejected with the error 'Can't update table 'OLD_TABLE' while 'NEW_TABLE' is being created'.) This is a change in behavior from MySQL 5.5 and earlier, which permitted *note 'CREATE TABLE ... SELECT': create-table-select. statements to make changes in tables other than the table being created.

Following the *note 'SELECT': select. keyword, you can use a number of modifiers that affect the operation of the statement. 'HIGH_PRIORITY', 'STRAIGHT_JOIN', and modifiers beginning with 'SQL_' are MySQL extensions to standard SQL.

The 'ALL' and 'DISTINCT' modifiers specify whether duplicate rows should be returned. 'ALL' (the default) specifies that all matching rows should be returned, including duplicates. 'DISTINCT' specifies removal of duplicate rows from the result set. It is an error to specify both modifiers. 'DISTINCTROW' is a synonym for 'DISTINCT'.
'HIGH_PRIORITY' gives the *note 'SELECT': select. higher priority than a statement that updates a table. You should use this only for queries that are very fast and must be done at once. A 'SELECT HIGH_PRIORITY' query that is issued while the table is locked for reading runs even if there is an update statement waiting for the table to be free. This affects only storage engines that use only table-level locking (such as 'MyISAM', 'MEMORY', and 'MERGE').

'HIGH_PRIORITY' cannot be used with note 'SELECT': select. statements that are part of a note 'UNION': union.
'STRAIGHT_JOIN' forces the optimizer to join the tables in the order in which they are listed in the 'FROM' clause. You can use this to speed up a query if the optimizer joins the tables in nonoptimal order. 'STRAIGHT_JOIN' also can be used in the TABLE_REFERENCES list. See *note join::.

'STRAIGHT_JOIN' does not apply to any table that the optimizer treats as a 'const' or 'system' table. Such a table produces a single row, is read during the optimization phase of query execution, and references to its columns are replaced with the appropriate column values before query execution proceeds. These tables appear first in the query plan displayed by note 'EXPLAIN': explain. See note using-explain::. This exception may not apply to 'const' or 'system' tables that are used on the 'NULL'-complemented side of an outer join (that is, the right-side table of a 'LEFT JOIN' or the left-side table of a 'RIGHT JOIN'.
'SQL_BIG_RESULT' or 'SQL_SMALL_RESULT' can be used with 'GROUP BY' or 'DISTINCT' to tell the optimizer that the result set has many rows or is small, respectively. For 'SQL_BIG_RESULT', MySQL directly uses disk-based temporary tables if they are created, and prefers sorting to using a temporary table with a key on the 'GROUP BY' elements. For 'SQL_SMALL_RESULT', MySQL uses in-memory temporary tables to store the resulting table instead of using sorting. This should not normally be needed.
'SQL_BUFFER_RESULT' forces the result to be put into a temporary table. This helps MySQL free the table locks early and helps in cases where it takes a long time to send the result set to the client. This modifier can be used only for top-level note 'SELECT': select. statements, not for subqueries or following note 'UNION': union.
'SQL_CALC_FOUND_ROWS' tells MySQL to calculate how many rows there would be in the result set, disregarding any 'LIMIT' clause. The number of rows can then be retrieved with 'SELECT FOUND_ROWS()'. See *note information-functions::.
The 'SQL_CACHE' and 'SQL_NO_CACHE' modifiers affect caching of query results in the query cache (see *note query-cache::). 'SQL_CACHE' tells MySQL to store the result in the query cache if it is cacheable and the value of the 'query_cache_type' system variable is '2' or 'DEMAND'. With 'SQL_NO_CACHE', the server does not use the query cache. It neither checks the query cache to see whether the result is already cached, nor does it cache the query result.

These two modifiers are mutually exclusive and an error occurs if they are both specified. Also, these modifiers are not permitted in subqueries (including subqueries in the 'FROM' clause), and note 'SELECT': select. statements in unions other than the first note 'SELECT': select.

For views, 'SQL_NO_CACHE' applies if it appears in any note 'SELECT': select. in the query. For a cacheable query, 'SQL_CACHE' applies if it appears in the first note 'SELECT': select. of a view referred to by the query.

Note:

The query cache is deprecated as of MySQL 5.7.20, and is removed in MySQL 8.0. Deprecation includes 'SQL_CACHE' and 'SQL_NO_CACHE'.

A 'SELECT' from a partitioned table using a storage engine such as note 'MyISAM': myisam-storage-engine. that employs table-level locks locks only those partitions containing rows that match the 'SELECT' statement 'WHERE' clause. (This does not occur with storage engines such as note 'InnoDB': innodb-storage-engine. that employ row-level locking.) For more information, see *note partitioning-limitations-locking::.

File: manual.info.tmp, Node: select-into, Next: join, Prev: select, Up: select

13.2.9.1 SELECT ... INTO Statement ..................................

The note 'SELECT ... INTO': select-into. form of note 'SELECT': select. enables a query result to be stored in variables or written to a file:

'SELECT ... INTO VAR_LIST' selects column values and stores them into variables.
'SELECT ... INTO OUTFILE' writes the selected rows to a file. Column and line terminators can be specified to produce a specific output format.
'SELECT ... INTO DUMPFILE' writes a single row to a file without any formatting.

A given note 'SELECT': select. statement can contain at most one 'INTO' clause, although as shown by the note 'SELECT': select. syntax description (see *note select::), the 'INTO' can appear in different positions:

Before 'FROM'. Example:
```
  SELECT * INTO @myvar FROM t1;
```

Before a trailing locking clause. Example:

  SELECT * FROM t1 INTO @myvar FOR UPDATE;

An 'INTO' clause should not be used in a nested note 'SELECT': select. because such a note 'SELECT': select. must return its result to the outer context. There are also constraints on the use of 'INTO' within note 'UNION': union. statements; see note union::.

For the 'INTO VAR_LIST' variant:

VAR_LIST names a list of one or more variables, each of which can be a user-defined variable, stored procedure or function parameter, or stored program local variable. (Within a prepared 'SELECT ... INTO VAR_LIST' statement, only user-defined variables are permitted; see *note local-variable-scope::.)
The selected values are assigned to the variables. The number of variables must match the number of columns. The query should return a single row. If the query returns no rows, a warning with error code 1329 occurs ('No data'), and the variable values remain unchanged. If the query returns multiple rows, error 1172 occurs ('Result consisted of more than one row'). If it is possible that the statement may retrieve multiple rows, you can use 'LIMIT 1' to limit the result set to a single row.
```
  SELECT id, data INTO @x, @y FROM test.t1 LIMIT 1;
```

User variable names are not case-sensitive. See *note user-variables::.

The note 'SELECT ... INTO OUTFILE 'FILE_NAME'': select-into. form of note 'SELECT': select. writes the selected rows to a file. The file is created on the server host, so you must have the 'FILE' privilege to use this syntax. FILE_NAME cannot be an existing file, which among other things prevents files such as '/etc/passwd' and database tables from being modified. The 'character_set_filesystem' system variable controls the interpretation of the file name.

The note 'SELECT ... INTO OUTFILE': select-into. statement is intended to enable dumping a table to a text file on the server host. To create the resulting file on some other host, note 'SELECT ... INTO OUTFILE': select-into. normally is unsuitable because there is no way to write a path to the file relative to the server host file system, unless the location of the file on the remote host can be accessed using a network-mapped path on the server host file system.

Alternatively, if the MySQL client software is installed on the remote host, you can use a client command such as 'mysql -e "SELECT ..." > FILE_NAME' to generate the file on that host.

note 'SELECT ... INTO OUTFILE': select-into. is the complement of note 'LOAD DATA': load-data. Column values are written converted to the character set specified in the 'CHARACTER SET' clause. If no such clause is present, values are dumped using the 'binary' character set. In effect, there is no character set conversion. If a result set contains columns in several character sets, so does the output data file and it may not be possible to reload the file correctly.

The syntax for the EXPORT_OPTIONS part of the statement consists of the same 'FIELDS' and 'LINES' clauses that are used with the note 'LOAD DATA': load-data. statement. For more detailed information about the 'FIELDS' and 'LINES' clauses, including their default values and permissible values, see note load-data::.

'FIELDS ESCAPED BY' controls how to write special characters. If the 'FIELDS ESCAPED BY' character is not empty, it is used when necessary to avoid ambiguity as a prefix that precedes following characters on output:

The 'FIELDS ESCAPED BY' character
The 'FIELDS [OPTIONALLY] ENCLOSED BY' character
The first character of the 'FIELDS TERMINATED BY' and 'LINES TERMINATED BY' values
ASCII 'NUL' (the zero-valued byte; what is actually written following the escape character is ASCII '0', not a zero-valued byte)

The 'FIELDS TERMINATED BY', 'ENCLOSED BY', 'ESCAPED BY', or 'LINES TERMINATED BY' characters must be escaped so that you can read the file back in reliably. ASCII 'NUL' is escaped to make it easier to view with some pagers.

The resulting file need not conform to SQL syntax, so nothing else need be escaped.

If the 'FIELDS ESCAPED BY' character is empty, no characters are escaped and 'NULL' is output as 'NULL', not ''. It is probably not a good idea to specify an empty escape character, particularly if field values in your data contain any of the characters in the list just given.

Here is an example that produces a file in the comma-separated values (CSV) format used by many programs:

 SELECT a,b,a+b INTO OUTFILE '/tmp/result.txt'
   FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
   LINES TERMINATED BY '\n'
   FROM test_table;

If you use 'INTO DUMPFILE' instead of 'INTO OUTFILE', MySQL writes only one row into the file, without any column or line termination and without performing any escape processing. This is useful for selecting a *note 'BLOB': blob. value and storing it in a file.

Note:

Any file created by 'INTO OUTFILE' or 'INTO DUMPFILE' is writable by all users on the server host. The reason for this is that the MySQL server cannot create a file that is owned by anyone other than the user under whose account it is running. (You should never run *note 'mysqld': mysqld. as 'root' for this and other reasons.) The file thus must be world-writable so that you can manipulate its contents.

If the 'secure_file_priv' system variable is set to a nonempty directory name, the file to be written must be located in that directory.

In the context of note 'SELECT ... INTO': select-into. statements that occur as part of events executed by the Event Scheduler, diagnostics messages (not only errors, but also warnings) are written to the error log, and, on Windows, to the application event log. For additional information, see note events-status-info::.

File: manual.info.tmp, Node: join, Next: union, Prev: select-into, Up: select

13.2.9.2 JOIN Clause ....................

MySQL supports the following 'JOIN' syntax for the TABLE_REFERENCES part of note 'SELECT': select. statements and multiple-table note 'DELETE': delete. and *note 'UPDATE': update. statements:

 TABLE_REFERENCES:
     ESCAPED_TABLE_REFERENCE [, ESCAPED_TABLE_REFERENCE] ...

 ESCAPED_TABLE_REFERENCE: {
     TABLE_REFERENCE
   | { OJ TABLE_REFERENCE }
 }

 TABLE_REFERENCE: {
     TABLE_FACTOR
   | JOINED_TABLE
 }

 TABLE_FACTOR: {
     TBL_NAME [PARTITION (PARTITION_NAMES)]
         [[AS] ALIAS] [INDEX_HINT_LIST]
   | TABLE_SUBQUERY [AS] ALIAS
   | ( TABLE_REFERENCES )
 }

 JOINED_TABLE: {
     TABLE_REFERENCE [INNER | CROSS] JOIN TABLE_FACTOR [JOIN_SPECIFICATION]
   | TABLE_REFERENCE STRAIGHT_JOIN TABLE_FACTOR
   | TABLE_REFERENCE STRAIGHT_JOIN TABLE_FACTOR ON SEARCH_CONDITION
   | TABLE_REFERENCE {LEFT|RIGHT} [OUTER] JOIN TABLE_REFERENCE JOIN_SPECIFICATION
   | TABLE_REFERENCE NATURAL [{LEFT|RIGHT} [OUTER]] JOIN TABLE_FACTOR
 }

 JOIN_SPECIFICATION: {
     ON SEARCH_CONDITION
   | USING (JOIN_COLUMN_LIST)
 }

 JOIN_COLUMN_LIST:
     COLUMN_NAME[, COLUMN_NAME] ...

 INDEX_HINT_LIST:
     INDEX_HINT[ INDEX_HINT] ...

 INDEX_HINT: {
     USE {INDEX|KEY}
       [FOR {JOIN|ORDER BY|GROUP BY}] ([INDEX_LIST])
   | {IGNORE|FORCE} {INDEX|KEY}
       [FOR {JOIN|ORDER BY|GROUP BY}] (INDEX_LIST)
 }

 INDEX_LIST:
     INDEX_NAME [, INDEX_NAME] ...

A table reference is also known as a join expression.

A table reference (when it refers to a partitioned table) may contain a 'PARTITION' clause, including a list of comma-separated partitions, subpartitions, or both. This option follows the name of the table and precedes any alias declaration. The effect of this option is that rows are selected only from the listed partitions or subpartitions. Any partitions or subpartitions not named in the list are ignored. For more information and examples, see *note partitioning-selection::.

The syntax of TABLE_FACTOR is extended in MySQL in comparison with standard SQL. The standard accepts only TABLE_REFERENCE, not a list of them inside a pair of parentheses.

This is a conservative extension if each comma in a list of TABLE_REFERENCE items is considered as equivalent to an inner join. For example:

 SELECT * FROM t1 LEFT JOIN (t2, t3, t4)
                  ON (t2.a = t1.a AND t3.b = t1.b AND t4.c = t1.c)

is equivalent to:

 SELECT * FROM t1 LEFT JOIN (t2 CROSS JOIN t3 CROSS JOIN t4)
                  ON (t2.a = t1.a AND t3.b = t1.b AND t4.c = t1.c)

In MySQL, 'JOIN', 'CROSS JOIN', and 'INNER JOIN' are syntactic equivalents (they can replace each other). In standard SQL, they are not equivalent. 'INNER JOIN' is used with an 'ON' clause, 'CROSS JOIN' is used otherwise.

In general, parentheses can be ignored in join expressions containing only inner join operations. MySQL also supports nested joins. See *note nested-join-optimization::.

Index hints can be specified to affect how the MySQL optimizer makes use of indexes. For more information, see note index-hints::. Optimizer hints and the 'optimizer_switch' system variable are other ways to influence optimizer use of indexes. See note optimizer-hints::, and *note switchable-optimizations::.

The following list describes general factors to take into account when writing joins:

A table reference can be aliased using 'TBL_NAME AS ALIAS_NAME' or TBL_NAME ALIAS_NAME:

  SELECT t1.name, t2.salary
    FROM employee AS t1 INNER JOIN info AS t2 ON t1.name = t2.name;

  SELECT t1.name, t2.salary
    FROM employee t1 INNER JOIN info t2 ON t1.name = t2.name;

A TABLE_SUBQUERY is also known as a derived table or subquery in the 'FROM' clause. See *note derived-tables::. Such subqueries must include an alias to give the subquery result a table name. A trivial example follows:
```
  SELECT * FROM (SELECT 1, 2, 3) AS t1;
```
The maximum number of tables that can be referenced in a single join is 61. This includes a join handled by merging derived tables and views in the 'FROM' clause into the outer query block (see *note derived-table-optimization::).
'INNER JOIN' and ',' (comma) are semantically equivalent in the absence of a join condition: both produce a Cartesian product between the specified tables (that is, each and every row in the first table is joined to each and every row in the second table).

However, the precedence of the comma operator is less than that of 'INNER JOIN', 'CROSS JOIN', 'LEFT JOIN', and so on. If you mix comma joins with the other join types when there is a join condition, an error of the form 'Unknown column 'COL_NAME' in 'on clause'' may occur. Information about dealing with this problem is given later in this section.
The SEARCH_CONDITION used with 'ON' is any conditional expression of the form that can be used in a 'WHERE' clause. Generally, the 'ON' clause serves for conditions that specify how to join tables, and the 'WHERE' clause restricts which rows to include in the result set.
If there is no matching row for the right table in the 'ON' or 'USING' part in a 'LEFT JOIN', a row with all columns set to 'NULL' is used for the right table. You can use this fact to find rows in a table that have no counterpart in another table:
```
  SELECT left_tbl.*
    FROM left_tbl LEFT JOIN right_tbl ON left_tbl.id = right_tbl.id
    WHERE right_tbl.id IS NULL;
```
This example finds all rows in 'left_tbl' with an 'id' value that is not present in 'right_tbl' (that is, all rows in 'left_tbl' with no corresponding row in 'right_tbl'). See *note outer-join-optimization::.
The 'USING(JOIN_COLUMN_LIST)' clause names a list of columns that must exist in both tables. If tables 'a' and 'b' both contain columns 'c1', 'c2', and 'c3', the following join compares corresponding columns from the two tables:
```
  a LEFT JOIN b USING (c1, c2, c3)
```
The 'NATURAL [LEFT] JOIN' of two tables is defined to be semantically equivalent to an 'INNER JOIN' or a 'LEFT JOIN' with a 'USING' clause that names all columns that exist in both tables.
'RIGHT JOIN' works analogously to 'LEFT JOIN'. To keep code portable across databases, it is recommended that you use 'LEFT JOIN' instead of 'RIGHT JOIN'.
The '{ OJ ... }' syntax shown in the join syntax description exists only for compatibility with ODBC. The curly braces in the syntax should be written literally; they are not metasyntax as used elsewhere in syntax descriptions.
```
  SELECT left_tbl.*
      FROM { OJ left_tbl LEFT OUTER JOIN right_tbl
             ON left_tbl.id = right_tbl.id }
      WHERE right_tbl.id IS NULL;
```
You can use other types of joins within '{ OJ ... }', such as 'INNER JOIN' or 'RIGHT OUTER JOIN'. This helps with compatibility with some third-party applications, but is not official ODBC syntax.
'STRAIGHT_JOIN' is similar to 'JOIN', except that the left table is always read before the right table. This can be used for those (few) cases for which the join optimizer processes the tables in a suboptimal order.

Some join examples:

 SELECT * FROM table1, table2;

 SELECT * FROM table1 INNER JOIN table2 ON table1.id = table2.id;

 SELECT * FROM table1 LEFT JOIN table2 ON table1.id = table2.id;

 SELECT * FROM table1 LEFT JOIN table2 USING (id);

 SELECT * FROM table1 LEFT JOIN table2 ON table1.id = table2.id
   LEFT JOIN table3 ON table2.id = table3.id;

Natural joins and joins with 'USING', including outer join variants, are processed according to the SQL:2003 standard:

Redundant columns of a 'NATURAL' join do not appear. Consider this set of statements:

  CREATE TABLE t1 (i INT, j INT);
  CREATE TABLE t2 (k INT, j INT);
  INSERT INTO t1 VALUES(1, 1);
  INSERT INTO t2 VALUES(1, 1);
  SELECT * FROM t1 NATURAL JOIN t2;
  SELECT * FROM t1 JOIN t2 USING (j);

In the first *note 'SELECT': select. statement, column 'j' appears in both tables and thus becomes a join column, so, according to standard SQL, it should appear only once in the output, not twice. Similarly, in the second SELECT statement, column 'j' is named in the 'USING' clause and should appear only once in the output, not twice.

Thus, the statements produce this output:

  +------+------+------+
  | j    | i    | k    |
  +------+------+------+
  |    1 |    1 |    1 |
  +------+------+------+
  +------+------+------+
  | j    | i    | k    |
  +------+------+------+
  |    1 |    1 |    1 |
  +------+------+------+

Redundant column elimination and column ordering occurs according to standard SQL, producing this display order:

* First, coalesced common columns of the two joined tables, in
  the order in which they occur in the first table

* Second, columns unique to the first table, in order in which
  they occur in that table

* Third, columns unique to the second table, in order in which
  they occur in that table

The single result column that replaces two common columns is defined using the coalesce operation. That is, for two 't1.a' and 't2.a' the resulting single join column 'a' is defined as 'a = COALESCE(t1.a, t2.a)', where:

  COALESCE(x, y) = (CASE WHEN x IS NOT NULL THEN x ELSE y END)

If the join operation is any other join, the result columns of the join consist of the concatenation of all columns of the joined tables.

A consequence of the definition of coalesced columns is that, for outer joins, the coalesced column contains the value of the non-'NULL' column if one of the two columns is always 'NULL'. If neither or both columns are 'NULL', both common columns have the same value, so it does not matter which one is chosen as the value of the coalesced column. A simple way to interpret this is to consider that a coalesced column of an outer join is represented by the common column of the inner table of a 'JOIN'. Suppose that the tables 't1(a, b)' and 't2(a, c)' have the following contents:

  t1    t2
  ----  ----
  1 x   2 z
  2 y   3 w

Then, for this join, column 'a' contains the values of 't1.a':

  mysql> SELECT * FROM t1 NATURAL LEFT JOIN t2;
  +------+------+------+
  | a    | b    | c    |
  +------+------+------+
  |    1 | x    | NULL |
  |    2 | y    | z    |
  +------+------+------+

By contrast, for this join, column 'a' contains the values of 't2.a'.

  mysql> SELECT * FROM t1 NATURAL RIGHT JOIN t2;
  +------+------+------+
  | a    | c    | b    |
  +------+------+------+
  |    2 | z    | y    |
  |    3 | w    | NULL |
  +------+------+------+

Compare those results to the otherwise equivalent queries with 'JOIN ... ON':

  mysql> SELECT * FROM t1 LEFT JOIN t2 ON (t1.a = t2.a);
  +------+------+------+------+
  | a    | b    | a    | c    |
  +------+------+------+------+
  |    1 | x    | NULL | NULL |
  |    2 | y    |    2 | z    |
  +------+------+------+------+

  mysql> SELECT * FROM t1 RIGHT JOIN t2 ON (t1.a = t2.a);
  +------+------+------+------+
  | a    | b    | a    | c    |
  +------+------+------+------+
  |    2 | y    |    2 | z    |
  | NULL | NULL |    3 | w    |
  +------+------+------+------+

A 'USING' clause can be rewritten as an 'ON' clause that compares corresponding columns. However, although 'USING' and 'ON' are similar, they are not quite the same. Consider the following two queries:
```
  a LEFT JOIN b USING (c1, c2, c3)
  a LEFT JOIN b ON a.c1 = b.c1 AND a.c2 = b.c2 AND a.c3 = b.c3
```
With respect to determining which rows satisfy the join condition, both joins are semantically identical.

With respect to determining which columns to display for 'SELECT ' expansion, the two joins are not semantically identical. The 'USING' join selects the coalesced value of corresponding columns, whereas the 'ON' join selects all columns from all tables. For the 'USING' join, 'SELECT ' selects these values:
```
  COALESCE(a.c1, b.c1), COALESCE(a.c2, b.c2), COALESCE(a.c3, b.c3)
```
For the 'ON' join, 'SELECT *' selects these values:
```
  a.c1, a.c2, a.c3, b.c1, b.c2, b.c3
```
With an inner join, 'COALESCE(a.c1, b.c1)' is the same as either 'a.c1' or 'b.c1' because both columns have the same value. With an outer join (such as 'LEFT JOIN'), one of the two columns can be 'NULL'. That column is omitted from the result.
An 'ON' clause can refer only to its operands.

Example:
```
  CREATE TABLE t1 (i1 INT);
  CREATE TABLE t2 (i2 INT);
  CREATE TABLE t3 (i3 INT);
  SELECT * FROM t1 JOIN t2 ON (i1 = i3) JOIN t3;
```
The statement fails with an 'Unknown column 'i3' in 'on clause'' error because 'i3' is a column in 't3', which is not an operand of the 'ON' clause. To enable the join to be processed, rewrite the statement as follows:
```
  SELECT * FROM t1 JOIN t2 JOIN t3 ON (i1 = i3);
```
'JOIN' has higher precedence than the comma operator (','), so the join expression 't1, t2 JOIN t3' is interpreted as '(t1, (t2 JOIN t3))', not as '((t1, t2) JOIN t3)'. This affects statements that use an 'ON' clause because that clause can refer only to columns in the operands of the join, and the precedence affects interpretation of what those operands are.

Example:
```
  CREATE TABLE t1 (i1 INT, j1 INT);
  CREATE TABLE t2 (i2 INT, j2 INT);
  CREATE TABLE t3 (i3 INT, j3 INT);
  INSERT INTO t1 VALUES(1, 1);
  INSERT INTO t2 VALUES(1, 1);
  INSERT INTO t3 VALUES(1, 1);
  SELECT * FROM t1, t2 JOIN t3 ON (t1.i1 = t3.i3);
```
The 'JOIN' takes precedence over the comma operator, so the operands for the 'ON' clause are 't2' and 't3'. Because 't1.i1' is not a column in either of the operands, the result is an 'Unknown column 't1.i1' in 'on clause'' error.

To enable the join to be processed, use either of these strategies:
```
* Group the first two tables explicitly with parentheses so that
  the operands for the 'ON' clause are '(t1, t2)' and 't3':

       SELECT * FROM (t1, t2) JOIN t3 ON (t1.i1 = t3.i3);

* Avoid the use of the comma operator and use 'JOIN' instead:

       SELECT * FROM t1 JOIN t2 JOIN t3 ON (t1.i1 = t3.i3);
```
The same precedence interpretation also applies to statements that mix the comma operator with 'INNER JOIN', 'CROSS JOIN', 'LEFT JOIN', and 'RIGHT JOIN', all of which have higher precedence than the comma operator.
A MySQL extension compared to the SQL:2003 standard is that MySQL permits you to qualify the common (coalesced) columns of 'NATURAL' or 'USING' joins, whereas the standard disallows that.

File: manual.info.tmp, Node: union, Prev: join, Up: select

13.2.9.3 UNION Clause .....................

 SELECT ...
 UNION [ALL | DISTINCT] SELECT ...
 [UNION [ALL | DISTINCT] SELECT ...]

note 'UNION': union. combines the result from multiple note 'SELECT': select. statements into a single result set. Example:

 mysql> SELECT 1, 2;
 +---+---+
 | 1 | 2 |
 +---+---+
 | 1 | 2 |
 +---+---+
 mysql> SELECT 'a', 'b';
 +---+---+
 | a | b |
 +---+---+
 | a | b |
 +---+---+
 mysql> SELECT 1, 2 UNION SELECT 'a', 'b';
 +---+---+
 | 1 | 2 |
 +---+---+
 | 1 | 2 |
 | a | b |
 +---+---+

*note union-result-set::
*note union-distinct-all::
*note union-order-by-limit::
*note union-restrictions::

Result Set Column Names and Data Types

The column names for a note 'UNION': union. result set are taken from the column names of the first note 'SELECT': select. statement.

Selected columns listed in corresponding positions of each note 'SELECT': select. statement should have the same data type. For example, the first column selected by the first statement should have the same type as the first column selected by the other statements. If the data types of corresponding note 'SELECT': select. columns do not match, the types and lengths of the columns in the note 'UNION': union. result take into account the values retrieved by all the note 'SELECT': select. statements. For example, consider the following, where the column length is not constrained to the length of the value from the first *note 'SELECT': select.:

 mysql> SELECT REPEAT('a',1) UNION SELECT REPEAT('b',20);
 +----------------------+
 | REPEAT('a',1)        |
 +----------------------+
 | a                    |
 | bbbbbbbbbbbbbbbbbbbb |
 +----------------------+

UNION DISTINCT and UNION ALL

By default, duplicate rows are removed from note 'UNION': union. results. The optional 'DISTINCT' keyword has the same effect but makes it explicit. With the optional 'ALL' keyword, duplicate-row removal does not occur and the result includes all matching rows from all the note 'SELECT': select. statements.

You can mix note 'UNION ALL': union. and note 'UNION DISTINCT': union. in the same query. Mixed note 'UNION': union. types are treated such that a 'DISTINCT' union overrides any 'ALL' union to its left. A 'DISTINCT' union can be produced explicitly by using note 'UNION DISTINCT': union. or implicitly by using *note 'UNION': union. with no following 'DISTINCT' or 'ALL' keyword.

ORDER BY and LIMIT in Unions

To apply an 'ORDER BY' or 'LIMIT' clause to an individual note 'SELECT': select, parenthesize the note 'SELECT': select. and place the clause inside the parentheses:

 (SELECT a FROM t1 WHERE a=10 AND B=1 ORDER BY a LIMIT 10)
 UNION
 (SELECT a FROM t2 WHERE a=11 AND B=2 ORDER BY a LIMIT 10);

Note:

Previous versions of MySQL may permit such statements without parentheses. In MySQL 5.7, the requirement for parentheses is enforced.

Use of 'ORDER BY' for individual note 'SELECT': select. statements implies nothing about the order in which the rows appear in the final result because note 'UNION': union. by default produces an unordered set of rows. Therefore, 'ORDER BY' in this context typically is used in conjunction with 'LIMIT', to determine the subset of the selected rows to retrieve for the note 'SELECT': select, even though it does not necessarily affect the order of those rows in the final note 'UNION': union. result. If 'ORDER BY' appears without 'LIMIT' in a *note 'SELECT': select, it is optimized away because it has no effect.

To use an 'ORDER BY' or 'LIMIT' clause to sort or limit the entire note 'UNION': union. result, parenthesize the individual note 'SELECT': select. statements and place the 'ORDER BY' or 'LIMIT' after the last one:

 (SELECT a FROM t1 WHERE a=10 AND B=1)
 UNION
 (SELECT a FROM t2 WHERE a=11 AND B=2)
 ORDER BY a LIMIT 10;

A statement without parentheses is equivalent to one parenthesized as just shown.

This kind of 'ORDER BY' cannot use column references that include a table name (that is, names in TBL_NAME.COL_NAME format). Instead, provide a column alias in the first *note 'SELECT': select. statement and refer to the alias in the 'ORDER BY'. (Alternatively, refer to the column in the 'ORDER BY' using its column position. However, use of column positions is deprecated.)

Also, if a column to be sorted is aliased, the 'ORDER BY' clause must refer to the alias, not the column name. The first of the following statements is permitted, but the second fails with an 'Unknown column 'a' in 'order clause'' error:

 (SELECT a AS b FROM t) UNION (SELECT ...) ORDER BY b;
 (SELECT a AS b FROM t) UNION (SELECT ...) ORDER BY a;

To cause rows in a note 'UNION': union. result to consist of the sets of rows retrieved by each note 'SELECT': select. one after the other, select an additional column in each note 'SELECT': select. to use as a sort column and add an 'ORDER BY' that sorts on that column following the last note 'SELECT': select.:

 (SELECT 1 AS sort_col, col1a, col1b, ... FROM t1)
 UNION
 (SELECT 2, col2a, col2b, ... FROM t2) ORDER BY sort_col;

To additionally maintain sort order within individual *note 'SELECT': select. results, add a secondary column to the 'ORDER BY' clause:

 (SELECT 1 AS sort_col, col1a, col1b, ... FROM t1)
 UNION
 (SELECT 2, col2a, col2b, ... FROM t2) ORDER BY sort_col, col1a;

Use of an additional column also enables you to determine which *note 'SELECT': select. each row comes from. Extra columns can provide other identifying information as well, such as a string that indicates a table name.

*note 'UNION': union. queries with an aggregate function in an 'ORDER BY' clause are rejected with an 'ER_AGGREGATE_ORDER_FOR_UNION' (https://dev.mysql.com/doc/mysql-errors/5.7/en/server-error-reference.html#error_er_aggregate_order_for_union) error. Example:

 SELECT 1 AS foo UNION SELECT 2 ORDER BY MAX(1);

UNION Restrictions

In a note 'UNION': union, the note 'SELECT': select. statements are normal select statements, but with the following restrictions:

'HIGH_PRIORITY' in the first note 'SELECT': select. has no effect. 'HIGH_PRIORITY' in any subsequent note 'SELECT': select. produces a syntax error.
Only the last note 'SELECT': select. statement can use an 'INTO' clause. However, the entire note 'UNION': union. result is written to the 'INTO' output destination.

File: manual.info.tmp, Node: subqueries, Next: update, Prev: select, Up: sql-data-manipulation-statements

13.2.10 Subqueries

Menu:
scalar-subqueries:: The Subquery as Scalar Operand
comparisons-using-subqueries:: Comparisons Using Subqueries
any-in-some-subqueries:: Subqueries with ANY, IN, or SOME
all-subqueries:: Subqueries with ALL
row-subqueries:: Row Subqueries
exists-and-not-exists-subqueries:: Subqueries with EXISTS or NOT EXISTS
correlated-subqueries:: Correlated Subqueries
derived-tables:: Derived Tables
subquery-errors:: Subquery Errors
optimizing-subqueries:: Optimizing Subqueries
rewriting-subqueries:: Rewriting Subqueries as Joins
subquery-restrictions:: Restrictions on Subqueries

A subquery is a *note 'SELECT': select. statement within another statement.

All subquery forms and operations that the SQL standard requires are supported, as well as a few features that are MySQL-specific.

Here is an example of a subquery:

 SELECT * FROM t1 WHERE column1 = (SELECT column1 FROM t2);

In this example, 'SELECT * FROM t1 ...' is the outer query (or outer statement), and '(SELECT column1 FROM t2)' is the subquery. We say that the subquery is nested within the outer query, and in fact it is possible to nest subqueries within other subqueries, to a considerable depth. A subquery must always appear within parentheses.

The main advantages of subqueries are:

They allow queries that are structured so that it is possible to isolate each part of a statement.
They provide alternative ways to perform operations that would otherwise require complex joins and unions.
Many people find subqueries more readable than complex joins or unions. Indeed, it was the innovation of subqueries that gave people the original idea of calling the early SQL 'Structured Query Language.'

Here is an example statement that shows the major points about subquery syntax as specified by the SQL standard and supported in MySQL:

 DELETE FROM t1
 WHERE s11 > ANY
  (SELECT COUNT(*) /* no hint */ FROM t2
   WHERE NOT EXISTS
    (SELECT * FROM t3
     WHERE ROW(5*t2.s1,77)=
      (SELECT 50,11*s1 FROM t4 UNION SELECT 50,77 FROM
       (SELECT * FROM t5) AS t5)));

A subquery can return a scalar (a single value), a single row, a single column, or a table (one or more rows of one or more columns). These are called scalar, column, row, and table subqueries. Subqueries that return a particular kind of result often can be used only in certain contexts, as described in the following sections.

There are few restrictions on the type of statements in which subqueries can be used. A subquery can contain many of the keywords or clauses that an ordinary note 'SELECT': select. can contain: 'DISTINCT', 'GROUP BY', 'ORDER BY', 'LIMIT', joins, index hints, note 'UNION': union. constructs, comments, functions, and so on.

A subquery's outer statement can be any one of: note 'SELECT': select, note 'INSERT': insert, note 'UPDATE': update, note 'DELETE': delete, note 'SET': set-variable, or note 'DO': do.

In MySQL, you cannot modify a table and select from the same table in a subquery. This applies to statements such as note 'DELETE': delete, note 'INSERT': insert, note 'REPLACE': replace, note 'UPDATE': update, and (because subqueries can be used in the 'SET' clause) *note 'LOAD DATA': load-data.

For information about how the optimizer handles subqueries, see note subquery-optimization::. For a discussion of restrictions on subquery use, including performance issues for certain forms of subquery syntax, see note subquery-restrictions::.

File: manual.info.tmp, Node: scalar-subqueries, Next: comparisons-using-subqueries, Prev: subqueries, Up: subqueries

13.2.10.1 The Subquery as Scalar Operand ........................................

In its simplest form, a subquery is a scalar subquery that returns a single value. A scalar subquery is a simple operand, and you can use it almost anywhere a single column value or literal is legal, and you can expect it to have those characteristics that all operands have: a data type, a length, an indication that it can be 'NULL', and so on. For example:

 CREATE TABLE t1 (s1 INT, s2 CHAR(5) NOT NULL);
 INSERT INTO t1 VALUES(100, 'abcde');
 SELECT (SELECT s2 FROM t1);

The subquery in this note 'SELECT': select. returns a single value (''abcde'') that has a data type of note 'CHAR': char, a length of 5, a character set and collation equal to the defaults in effect at *note 'CREATE TABLE': create-table. time, and an indication that the value in the column can be 'NULL'. Nullability of the value selected by a scalar subquery is not copied because if the subquery result is empty, the result is 'NULL'. For the subquery just shown, if 't1' were empty, the result would be 'NULL' even though 's2' is 'NOT NULL'.

There are a few contexts in which a scalar subquery cannot be used. If a statement permits only a literal value, you cannot use a subquery. For example, 'LIMIT' requires literal integer arguments, and *note 'LOAD DATA': load-data. requires a literal string file name. You cannot use subqueries to supply these values.

When you see examples in the following sections that contain the rather spartan construct '(SELECT column1 FROM t1)', imagine that your own code contains much more diverse and complex constructions.

Suppose that we make two tables:

 CREATE TABLE t1 (s1 INT);
 INSERT INTO t1 VALUES (1);
 CREATE TABLE t2 (s1 INT);
 INSERT INTO t2 VALUES (2);

Then perform a *note 'SELECT': select.:

 SELECT (SELECT s1 FROM t2) FROM t1;

The result is '2' because there is a row in 't2' containing a column 's1' that has a value of '2'.

A scalar subquery can be part of an expression, but remember the parentheses, even if the subquery is an operand that provides an argument for a function. For example:

 SELECT UPPER((SELECT s1 FROM t1)) FROM t2;

File: manual.info.tmp, Node: comparisons-using-subqueries, Next: any-in-some-subqueries, Prev: scalar-subqueries, Up: subqueries

13.2.10.2 Comparisons Using Subqueries ......................................

The most common use of a subquery is in the form:

 NON_SUBQUERY_OPERAND COMPARISON_OPERATOR (SUBQUERY)

Where COMPARISON_OPERATOR is one of these operators:

 =  >  <  >=  <=  <>  !=  <=>

For example:

 ... WHERE 'a' = (SELECT column1 FROM t1)

MySQL also permits this construct:

 NON_SUBQUERY_OPERAND LIKE (SUBQUERY)

At one time the only legal place for a subquery was on the right side of a comparison, and you might still find some old DBMSs that insist on this.

Here is an example of a common-form subquery comparison that you cannot do with a join. It finds all the rows in table 't1' for which the 'column1' value is equal to a maximum value in table 't2':

 SELECT * FROM t1
   WHERE column1 = (SELECT MAX(column2) FROM t2);

Here is another example, which again is impossible with a join because it involves aggregating for one of the tables. It finds all rows in table 't1' containing a value that occurs twice in a given column:

 SELECT * FROM t1 AS t
   WHERE 2 = (SELECT COUNT(*) FROM t1 WHERE t1.id = t.id);

For a comparison of the subquery to a scalar, the subquery must return a scalar. For a comparison of the subquery to a row constructor, the subquery must be a row subquery that returns a row with the same number of values as the row constructor. See *note row-subqueries::.

File: manual.info.tmp, Node: any-in-some-subqueries, Next: all-subqueries, Prev: comparisons-using-subqueries, Up: subqueries

13.2.10.3 Subqueries with ANY, IN, or SOME ..........................................

Syntax:

 OPERAND COMPARISON_OPERATOR ANY (SUBQUERY)
 OPERAND IN (SUBQUERY)
 OPERAND COMPARISON_OPERATOR SOME (SUBQUERY)

Where COMPARISON_OPERATOR is one of these operators:

 =  >  <  >=  <=  <>  !=

The 'ANY' keyword, which must follow a comparison operator, means 'return 'TRUE' if the comparison is 'TRUE' for 'ANY' of the values in the column that the subquery returns.' For example:

 SELECT s1 FROM t1 WHERE s1 > ANY (SELECT s1 FROM t2);

Suppose that there is a row in table 't1' containing '(10)'. The expression is 'TRUE' if table 't2' contains '(21,14,7)' because there is a value '7' in 't2' that is less than '10'. The expression is 'FALSE' if table 't2' contains '(20,10)', or if table 't2' is empty. The expression is unknown (that is, 'NULL') if table 't2' contains '(NULL,NULL,NULL)'.

When used with a subquery, the word 'IN' is an alias for '= ANY'. Thus, these two statements are the same:

 SELECT s1 FROM t1 WHERE s1 = ANY (SELECT s1 FROM t2);
 SELECT s1 FROM t1 WHERE s1 IN    (SELECT s1 FROM t2);

'IN' and '= ANY' are not synonyms when used with an expression list. 'IN' can take an expression list, but '= ANY' cannot. See *note comparison-operators::.

'NOT IN' is not an alias for '<> ANY', but for '<> ALL'. See *note all-subqueries::.

The word 'SOME' is an alias for 'ANY'. Thus, these two statements are the same:

 SELECT s1 FROM t1 WHERE s1 <> ANY  (SELECT s1 FROM t2);
 SELECT s1 FROM t1 WHERE s1 <> SOME (SELECT s1 FROM t2);

Use of the word 'SOME' is rare, but this example shows why it might be useful. To most people, the English phrase 'a is not equal to any b' means 'there is no b which is equal to a,' but that is not what is meant by the SQL syntax. The syntax means 'there is some b to which a is not equal.' Using '<> SOME' instead helps ensure that everyone understands the true meaning of the query.

File: manual.info.tmp, Node: all-subqueries, Next: row-subqueries, Prev: any-in-some-subqueries, Up: subqueries

13.2.10.4 Subqueries with ALL .............................

Syntax:

 OPERAND COMPARISON_OPERATOR ALL (SUBQUERY)

The word 'ALL', which must follow a comparison operator, means 'return 'TRUE' if the comparison is 'TRUE' for 'ALL' of the values in the column that the subquery returns.' For example:

 SELECT s1 FROM t1 WHERE s1 > ALL (SELECT s1 FROM t2);

Suppose that there is a row in table 't1' containing '(10)'. The expression is 'TRUE' if table 't2' contains '(-5,0,+5)' because '10' is greater than all three values in 't2'. The expression is 'FALSE' if table 't2' contains '(12,6,NULL,-100)' because there is a single value '12' in table 't2' that is greater than '10'. The expression is unknown (that is, 'NULL') if table 't2' contains '(0,NULL,1)'.

Finally, the expression is 'TRUE' if table 't2' is empty. So, the following expression is 'TRUE' when table 't2' is empty:

 SELECT * FROM t1 WHERE 1 > ALL (SELECT s1 FROM t2);

But this expression is 'NULL' when table 't2' is empty:

 SELECT * FROM t1 WHERE 1 > (SELECT s1 FROM t2);

In addition, the following expression is 'NULL' when table 't2' is empty:

 SELECT * FROM t1 WHERE 1 > ALL (SELECT MAX(s1) FROM t2);

In general, tables containing 'NULL' values and empty tables are 'edge cases.' When writing subqueries, always consider whether you have taken those two possibilities into account.

'NOT IN' is an alias for '<> ALL'. Thus, these two statements are the same:

 SELECT s1 FROM t1 WHERE s1 <> ALL (SELECT s1 FROM t2);
 SELECT s1 FROM t1 WHERE s1 NOT IN (SELECT s1 FROM t2);

File: manual.info.tmp, Node: row-subqueries, Next: exists-and-not-exists-subqueries, Prev: all-subqueries, Up: subqueries

13.2.10.5 Row Subqueries ........................

Scalar or column subqueries return a single value or a column of values. A row subquery is a subquery variant that returns a single row and can thus return more than one column value. Legal operators for row subquery comparisons are:

 =  >  <  >=  <=  <>  !=  <=>

Here are two examples:

 SELECT * FROM t1
   WHERE (col1,col2) = (SELECT col3, col4 FROM t2 WHERE id = 10);
 SELECT * FROM t1
   WHERE ROW(col1,col2) = (SELECT col3, col4 FROM t2 WHERE id = 10);

For both queries, if the table 't2' contains a single row with 'id = 10', the subquery returns a single row. If this row has 'col3' and 'col4' values equal to the 'col1' and 'col2' values of any rows in 't1', the 'WHERE' expression is 'TRUE' and each query returns those 't1' rows. If the 't2' row 'col3' and 'col4' values are not equal the 'col1' and 'col2' values of any 't1' row, the expression is 'FALSE' and the query returns an empty result set. The expression is unknown (that is, 'NULL') if the subquery produces no rows. An error occurs if the subquery produces multiple rows because a row subquery can return at most one row.

For information about how each operator works for row comparisons, see *note comparison-operators::.

The expressions '(1,2)' and 'ROW(1,2)' are sometimes called row constructors. The two are equivalent. The row constructor and the row returned by the subquery must contain the same number of values.

A row constructor is used for comparisons with subqueries that return two or more columns. When a subquery returns a single column, this is regarded as a scalar value and not as a row, so a row constructor cannot be used with a subquery that does not return at least two columns. Thus, the following query fails with a syntax error:

 SELECT * FROM t1 WHERE ROW(1) = (SELECT column1 FROM t2)

Row constructors are legal in other contexts. For example, the following two statements are semantically equivalent (and are handled in the same way by the optimizer):

 SELECT * FROM t1 WHERE (column1,column2) = (1,1);
 SELECT * FROM t1 WHERE column1 = 1 AND column2 = 1;

The following query answers the request, 'find all rows in table 't1' that also exist in table 't2'':

 SELECT column1,column2,column3
   FROM t1
   WHERE (column1,column2,column3) IN
          (SELECT column1,column2,column3 FROM t2);

For more information about the optimizer and row constructors, see *note row-constructor-optimization::

File: manual.info.tmp, Node: exists-and-not-exists-subqueries, Next: correlated-subqueries, Prev: row-subqueries, Up: subqueries

13.2.10.6 Subqueries with EXISTS or NOT EXISTS ..............................................

If a subquery returns any rows at all, 'EXISTS SUBQUERY' is 'TRUE', and 'NOT EXISTS SUBQUERY' is 'FALSE'. For example:

 SELECT column1 FROM t1 WHERE EXISTS (SELECT * FROM t2);

Traditionally, an 'EXISTS' subquery starts with 'SELECT ', but it could begin with 'SELECT 5' or 'SELECT column1' or anything at all. MySQL ignores the note 'SELECT': select. list in such a subquery, so it makes no difference.

For the preceding example, if 't2' contains any rows, even rows with nothing but 'NULL' values, the 'EXISTS' condition is 'TRUE'. This is actually an unlikely example because a '[NOT] EXISTS' subquery almost always contains correlations. Here are some more realistic examples:

What kind of store is present in one or more cities?

  SELECT DISTINCT store_type FROM stores
    WHERE EXISTS (SELECT * FROM cities_stores
                  WHERE cities_stores.store_type = stores.store_type);

What kind of store is present in no cities?

  SELECT DISTINCT store_type FROM stores
    WHERE NOT EXISTS (SELECT * FROM cities_stores
                      WHERE cities_stores.store_type = stores.store_type);

What kind of store is present in all cities?

  SELECT DISTINCT store_type FROM stores
    WHERE NOT EXISTS (
      SELECT * FROM cities WHERE NOT EXISTS (
        SELECT * FROM cities_stores
         WHERE cities_stores.city = cities.city
         AND cities_stores.store_type = stores.store_type));

The last example is a double-nested 'NOT EXISTS' query. That is, it has a 'NOT EXISTS' clause within a 'NOT EXISTS' clause. Formally, it answers the question 'does a city exist with a store that is not in 'Stores''? But it is easier to say that a nested 'NOT EXISTS' answers the question 'is X 'TRUE' for all Y?'

File: manual.info.tmp, Node: correlated-subqueries, Next: derived-tables, Prev: exists-and-not-exists-subqueries, Up: subqueries

13.2.10.7 Correlated Subqueries ...............................

A correlated subquery is a subquery that contains a reference to a table that also appears in the outer query. For example:

 SELECT * FROM t1
   WHERE column1 = ANY (SELECT column1 FROM t2
                        WHERE t2.column2 = t1.column2);

Notice that the subquery contains a reference to a column of 't1', even though the subquery's 'FROM' clause does not mention a table 't1'. So, MySQL looks outside the subquery, and finds 't1' in the outer query.

Suppose that table 't1' contains a row where 'column1 = 5' and 'column2 = 6'; meanwhile, table 't2' contains a row where 'column1 = 5' and 'column2 = 7'. The simple expression '... WHERE column1 = ANY (SELECT column1 FROM t2)' would be 'TRUE', but in this example, the 'WHERE' clause within the subquery is 'FALSE' (because '(5,6)' is not equal to '(5,7)'), so the expression as a whole is 'FALSE'.

Scoping rule: MySQL evaluates from inside to outside. For example:

 SELECT column1 FROM t1 AS x
   WHERE x.column1 = (SELECT column1 FROM t2 AS x
     WHERE x.column1 = (SELECT column1 FROM t3
       WHERE x.column2 = t3.column1));

In this statement, 'x.column2' must be a column in table 't2' because 'SELECT column1 FROM t2 AS x ...' renames 't2'. It is not a column in table 't1' because 'SELECT column1 FROM t1 ...' is an outer query that is farther out.

For subqueries in 'HAVING' or 'ORDER BY' clauses, MySQL also looks for column names in the outer select list.

For certain cases, a correlated subquery is optimized. For example:

 VAL IN (SELECT KEY_VAL FROM TBL_NAME WHERE CORRELATED_CONDITION)

Otherwise, they are inefficient and likely to be slow. Rewriting the query as a join might improve performance.

Aggregate functions in correlated subqueries may contain outer references, provided the function contains nothing but outer references, and provided the function is not contained in another function or expression.

File: manual.info.tmp, Node: derived-tables, Next: subquery-errors, Prev: correlated-subqueries, Up: subqueries

13.2.10.8 Derived Tables ........................

A derived table is an expression that generates a table within the scope of a query 'FROM' clause. For example, a subquery in a *note 'SELECT': select. statement 'FROM' clause is a derived table:

 SELECT ... FROM (SUBQUERY) [AS] TBL_NAME ...

The '[AS] TBL_NAME' clause is mandatory because every table in a 'FROM' clause must have a name. Any columns in the derived table must have unique names.

For the sake of illustration, assume that you have this table:

 CREATE TABLE t1 (s1 INT, s2 CHAR(5), s3 FLOAT);

Here is how to use a subquery in the 'FROM' clause, using the example table:

 INSERT INTO t1 VALUES (1,'1',1.0);
 INSERT INTO t1 VALUES (2,'2',2.0);
 SELECT sb1,sb2,sb3
   FROM (SELECT s1 AS sb1, s2 AS sb2, s3*2 AS sb3 FROM t1) AS sb
   WHERE sb1 > 1;

Result:

 +------+------+------+
 | sb1  | sb2  | sb3  |
 +------+------+------+
 |    2 | 2    |    4 |
 +------+------+------+

Here is another example: Suppose that you want to know the average of a set of sums for a grouped table. This does not work:

 SELECT AVG(SUM(column1)) FROM t1 GROUP BY column1;

However, this query provides the desired information:

 SELECT AVG(sum_column1)
   FROM (SELECT SUM(column1) AS sum_column1
         FROM t1 GROUP BY column1) AS t1;

Notice that the column name used within the subquery ('sum_column1') is recognized in the outer query.

A derived table can return a scalar, column, row, or table.

Derived tables are subject to these restrictions:

A derived table cannot be a correlated subquery.
A derived table cannot contain references to other tables of the same *note 'SELECT': select.
A derived table cannot contain outer references. This is a MySQL restriction, not a restriction of the SQL standard.

The optimizer determines information about derived tables in such a way that note 'EXPLAIN': explain. does not need to materialize them. See note derived-table-optimization::.

It is possible under certain circumstances that using *note 'EXPLAIN SELECT': explain. modifies table data. This can occur if the outer query accesses any tables and an inner query invokes a stored function that changes one or more rows of a table. Suppose that there are two tables 't1' and 't2' in database 'd1', and a stored function 'f1' that modifies 't2', created as shown here:

 CREATE DATABASE d1;
 USE d1;
 CREATE TABLE t1 (c1 INT);
 CREATE TABLE t2 (c1 INT);
 CREATE FUNCTION f1(p1 INT) RETURNS INT
   BEGIN
     INSERT INTO t2 VALUES (p1);
     RETURN p1;
   END;

Referencing the function directly in an *note 'EXPLAIN SELECT': explain. has no effect on 't2', as shown here:

 mysql> SELECT * FROM t2;
 Empty set (0.02 sec)

 mysql> EXPLAIN SELECT f1(5)\G
 *************************** 1. row ***************************
            id: 1
   select_type: SIMPLE
         table: NULL
    partitions: NULL
          type: NULL
 possible_keys: NULL
           key: NULL
       key_len: NULL
           ref: NULL
          rows: NULL
      filtered: NULL
         Extra: No tables used
 1 row in set (0.01 sec)

 mysql> SELECT * FROM t2;
 Empty set (0.01 sec)

This is because the note 'SELECT': select. statement did not reference any tables, as can be seen in the 'table' and 'Extra' columns of the output. This is also true of the following nested note 'SELECT': select.:

 mysql> EXPLAIN SELECT NOW() AS a1, (SELECT f1(5)) AS a2\G
 *************************** 1. row ***************************
            id: 1
   select_type: SIMPLE
         table: NULL
          type: NULL
 possible_keys: NULL
           key: NULL
       key_len: NULL
           ref: NULL
          rows: NULL
      filtered: NULL
         Extra: No tables used
 1 row in set, 1 warning (0.00 sec)

 mysql> SHOW WARNINGS;
 +-------+------+------------------------------------------+
 | Level | Code | Message                                  |
 +-------+------+------------------------------------------+
 | Note  | 1249 | Select 2 was reduced during optimization |
 +-------+------+------------------------------------------+
 1 row in set (0.00 sec)

 mysql> SELECT * FROM t2;
 Empty set (0.00 sec)

However, if the outer *note 'SELECT': select. references any tables, the optimizer executes the statement in the subquery as well, with the result that 't2' is modified:

 mysql> EXPLAIN SELECT * FROM t1 AS a1, (SELECT f1(5)) AS a2\G
 *************************** 1. row ***************************
            id: 1
   select_type: PRIMARY
         table: <derived2>
    partitions: NULL
          type: system
 possible_keys: NULL
           key: NULL
       key_len: NULL
           ref: NULL
          rows: 1
      filtered: 100.00
         Extra: NULL
 *************************** 2. row ***************************
            id: 1
   select_type: PRIMARY
         table: a1
    partitions: NULL
          type: ALL
 possible_keys: NULL
           key: NULL
       key_len: NULL
           ref: NULL
          rows: 1
      filtered: 100.00
         Extra: NULL
 *************************** 3. row ***************************
            id: 2
   select_type: DERIVED
         table: NULL
    partitions: NULL
          type: NULL
 possible_keys: NULL
           key: NULL
       key_len: NULL
           ref: NULL
          rows: NULL
      filtered: NULL
         Extra: No tables used
 3 rows in set (0.00 sec)

 mysql> SELECT * FROM t2;
 +------+
 | c1   |
 +------+
 |    5 |
 +------+
 1 row in set (0.00 sec)

File: manual.info.tmp, Node: subquery-errors, Next: optimizing-subqueries, Prev: derived-tables, Up: subqueries

13.2.10.9 Subquery Errors .........................

There are some errors that apply only to subqueries. This section describes them.

Unsupported subquery syntax:

  ERROR 1235 (ER_NOT_SUPPORTED_YET)
  SQLSTATE = 42000
  Message = "This version of MySQL does not yet support
  'LIMIT & IN/ALL/ANY/SOME subquery'"

This means that MySQL does not support statements of the following form:

  SELECT * FROM t1 WHERE s1 IN (SELECT s2 FROM t2 ORDER BY s1 LIMIT 1)

Incorrect number of columns from subquery:
```
  ERROR 1241 (ER_OPERAND_COL)
  SQLSTATE = 21000
  Message = "Operand should contain 1 column(s)"
```
This error occurs in cases like this:
```
  SELECT (SELECT column1, column2 FROM t2) FROM t1;
```
You may use a subquery that returns multiple columns, if the purpose is row comparison. In other contexts, the subquery must be a scalar operand. See *note row-subqueries::.
Incorrect number of rows from subquery:
```
  ERROR 1242 (ER_SUBSELECT_NO_1_ROW)
  SQLSTATE = 21000
  Message = "Subquery returns more than 1 row"
```
This error occurs for statements where the subquery must return at most one row but returns multiple rows. Consider the following example:
```
  SELECT * FROM t1 WHERE column1 = (SELECT column1 FROM t2);
```
If 'SELECT column1 FROM t2' returns just one row, the previous query works. If the subquery returns more than one row, error 1242 occurs. In that case, the query should be rewritten as:
```
  SELECT * FROM t1 WHERE column1 = ANY (SELECT column1 FROM t2);
```
Incorrectly used table in subquery:
```
  Error 1093 (ER_UPDATE_TABLE_USED)
  SQLSTATE = HY000
  Message = "You can't specify target table 'x'
  for update in FROM clause"
```
This error occurs in cases such as the following, which attempts to modify a table and select from the same table in the subquery:
```
  UPDATE t1 SET column2 = (SELECT MAX(column1) FROM t1);
```
You can use a subquery for assignment within an note 'UPDATE': update. statement because subqueries are legal in note 'UPDATE': update. and note 'DELETE': delete. statements as well as in note 'SELECT': select. statements. However, you cannot use the same table (in this case, table 't1') for both the subquery 'FROM' clause and the update target.

For transactional storage engines, the failure of a subquery causes the entire statement to fail. For nontransactional storage engines, data modifications made before the error was encountered are preserved.

File: manual.info.tmp, Node: optimizing-subqueries, Next: rewriting-subqueries, Prev: subquery-errors, Up: subqueries

13.2.10.10 Optimizing Subqueries ................................

Development is ongoing, so no optimization tip is reliable for the long term. The following list provides some interesting tricks that you might want to play with. See also *note subquery-optimization::.

Use subquery clauses that affect the number or order of the rows in the subquery. For example:

  SELECT * FROM t1 WHERE t1.column1 IN
    (SELECT column1 FROM t2 ORDER BY column1);
  SELECT * FROM t1 WHERE t1.column1 IN
    (SELECT DISTINCT column1 FROM t2);
  SELECT * FROM t1 WHERE EXISTS
    (SELECT * FROM t2 LIMIT 1);

Replace a join with a subquery. For example, try this:

  SELECT DISTINCT column1 FROM t1 WHERE t1.column1 IN (
    SELECT column1 FROM t2);

Instead of this:

  SELECT DISTINCT t1.column1 FROM t1, t2
    WHERE t1.column1 = t2.column1;

Some subqueries can be transformed to joins for compatibility with older versions of MySQL that do not support subqueries. However, in some cases, converting a subquery to a join may improve performance. See *note rewriting-subqueries::.

Move clauses from outside to inside the subquery. For example, use this query:

  SELECT * FROM t1
    WHERE s1 IN (SELECT s1 FROM t1 UNION ALL SELECT s1 FROM t2);

Instead of this query:

  SELECT * FROM t1
    WHERE s1 IN (SELECT s1 FROM t1) OR s1 IN (SELECT s1 FROM t2);

For another example, use this query:

  SELECT (SELECT column1 + 5 FROM t1) FROM t2;

Instead of this query:

  SELECT (SELECT column1 FROM t1) + 5 FROM t2;

Use a row subquery instead of a correlated subquery. For example, use this query:

  SELECT * FROM t1
    WHERE (column1,column2) IN (SELECT column1,column2 FROM t2);

Instead of this query:

  SELECT * FROM t1
    WHERE EXISTS (SELECT * FROM t2 WHERE t2.column1=t1.column1
                  AND t2.column2=t1.column2);

Use 'NOT (a = ANY (...))' rather than 'a <> ALL (...)'.
Use 'x = ANY (TABLE CONTAINING (1,2))' rather than 'x=1 OR x=2'.
Use '= ANY' rather than 'EXISTS'.

For uncorrelated subqueries that always return one row, 'IN' is always slower than '='. For example, use this query:

  SELECT * FROM t1
    WHERE t1.COL_NAME = (SELECT a FROM t2 WHERE b = SOME_CONST);

Instead of this query:

  SELECT * FROM t1
    WHERE t1.COL_NAME IN (SELECT a FROM t2 WHERE b = SOME_CONST);

These tricks might cause programs to go faster or slower. Using MySQL facilities like the 'BENCHMARK()' function, you can get an idea about what helps in your own situation. See *note information-functions::.

Some optimizations that MySQL itself makes are:

MySQL executes uncorrelated subqueries only once. Use *note 'EXPLAIN': explain. to make sure that a given subquery really is uncorrelated.
MySQL rewrites 'IN', 'ALL', 'ANY', and 'SOME' subqueries in an attempt to take advantage of the possibility that the select-list columns in the subquery are indexed.
MySQL replaces subqueries of the following form with an index-lookup function, which *note 'EXPLAIN': explain. describes as a special join type ('unique_subquery' or 'index_subquery'):
```
  ... IN (SELECT INDEXED_COLUMN FROM SINGLE_TABLE ...)
```
MySQL enhances expressions of the following form with an expression involving 'MIN()' or 'MAX()', unless 'NULL' values or empty sets are involved:
```
  VALUE {ALL|ANY|SOME} {> | < | >= | <=} (UNCORRELATED SUBQUERY)
```
For example, this 'WHERE' clause:
```
  WHERE 5 > ALL (SELECT x FROM t)
```
might be treated by the optimizer like this:
```
  WHERE 5 > (SELECT MAX(x) FROM t)
```

See also MySQL Internals: How MySQL Transforms Subqueries (https://dev.mysql.com/doc/internals/en/transformations.html).

File: manual.info.tmp, Node: rewriting-subqueries, Next: subquery-restrictions, Prev: optimizing-subqueries, Up: subqueries

13.2.10.11 Rewriting Subqueries as Joins ........................................

Sometimes there are other ways to test membership in a set of values than by using a subquery. Also, on some occasions, it is not only possible to rewrite a query without a subquery, but it can be more efficient to make use of some of these techniques rather than to use subqueries. One of these is the 'IN()' construct:

For example, this query:

 SELECT * FROM t1 WHERE id IN (SELECT id FROM t2);

Can be rewritten as:

 SELECT DISTINCT t1.* FROM t1, t2 WHERE t1.id=t2.id;

The queries:

 SELECT * FROM t1 WHERE id NOT IN (SELECT id FROM t2);
 SELECT * FROM t1 WHERE NOT EXISTS (SELECT id FROM t2 WHERE t1.id=t2.id);

Can be rewritten as:

 SELECT table1.*
   FROM table1 LEFT JOIN table2 ON table1.id=table2.id
   WHERE table2.id IS NULL;

A 'LEFT [OUTER] JOIN' can be faster than an equivalent subquery because the server might be able to optimize it better--a fact that is not specific to MySQL Server alone. Prior to SQL-92, outer joins did not exist, so subqueries were the only way to do certain things. Today, MySQL Server and many other modern database systems offer a wide range of outer join types.

MySQL Server supports multiple-table note 'DELETE': delete. statements that can be used to efficiently delete rows based on information from one table or even from many tables at the same time. Multiple-table note 'UPDATE': update. statements are also supported. See note delete::, and note update::.

File: manual.info.tmp, Node: subquery-restrictions, Prev: rewriting-subqueries, Up: subqueries

13.2.10.12 Restrictions on Subqueries .....................................

In general, you cannot modify a table and select from the same table in a subquery. For example, this limitation applies to statements of the following forms:
```
  DELETE FROM t WHERE ... (SELECT ... FROM t ...);
  UPDATE t ... WHERE col = (SELECT ... FROM t ...);
  {INSERT|REPLACE} INTO t (SELECT ... FROM t ...);
```
Exception: The preceding prohibition does not apply if for the modified table you are using a derived table and that derived table is materialized rather than merged into the outer query. (See *note derived-table-optimization::.) Example:
```
  UPDATE t ... WHERE col = (SELECT * FROM (SELECT ... FROM t...) AS dt ...);
```
Here the result from the derived table is materialized as a temporary table, so the relevant rows in 't' have already been selected by the time the update to 't' takes place.

Row comparison operations are only partially supported:

* For 'EXPR [NOT] IN SUBQUERY', EXPR can be an N-tuple
  (specified using row constructor syntax) and the subquery can
  return rows of N-tuples.  The permitted syntax is therefore
  more specifically expressed as 'ROW_CONSTRUCTOR [NOT] IN
  TABLE_SUBQUERY'

* For 'EXPR OP {ALL|ANY|SOME} SUBQUERY', EXPR must be a scalar
  value and the subquery must be a column subquery; it cannot
  return multiple-column rows.

In other words, for a subquery that returns rows of N-tuples, this is supported:

  (EXPR_1, ..., EXPR_N) [NOT] IN TABLE_SUBQUERY

But this is not supported:

  (EXPR_1, ..., EXPR_N) OP {ALL|ANY|SOME} SUBQUERY

The reason for supporting row comparisons for 'IN' but not for the others is that 'IN' is implemented by rewriting it as a sequence of '=' comparisons and 'AND' operations. This approach cannot be used for 'ALL', 'ANY', or 'SOME'.

Subqueries in the 'FROM' clause cannot be correlated subqueries. They are materialized in whole (evaluated to produce a result set) during query execution, so they cannot be evaluated per row of the outer query. The optimizer delays materialization until the result is needed, which may permit materialization to be avoided. See *note derived-table-optimization::.

MySQL does not support 'LIMIT' in subqueries for certain subquery operators:

  mysql> SELECT * FROM t1
         WHERE s1 IN (SELECT s2 FROM t2 ORDER BY s1 LIMIT 1);
  ERROR 1235 (42000): This version of MySQL does not yet support
   'LIMIT & IN/ALL/ANY/SOME subquery'

MySQL permits a subquery to refer to a stored function that has data-modifying side effects such as inserting rows into a table. For example, if 'f()' inserts rows, the following query can modify data:
```
  SELECT ... WHERE x IN (SELECT f() ...);
```
This behavior is an extension to the SQL standard. In MySQL, it can produce nondeterministic results because 'f()' might be executed a different number of times for different executions of a given query depending on how the optimizer chooses to handle it.

For statement-based or mixed-format replication, one implication of this indeterminism is that such a query can produce different results on the source and its replicas.

File: manual.info.tmp, Node: update, Prev: subqueries, Up: sql-data-manipulation-statements

13.2.11 UPDATE Statement

*note 'UPDATE': update. is a DML statement that modifies rows in a table.

Single-table syntax:

 UPDATE [LOW_PRIORITY] [IGNORE] TABLE_REFERENCE
     SET ASSIGNMENT_LIST
     [WHERE WHERE_CONDITION]
     [ORDER BY ...]
     [LIMIT ROW_COUNT]

 VALUE:
     {EXPR | DEFAULT}

 ASSIGNMENT:
     COL_NAME = VALUE

 ASSIGNMENT_LIST:
     ASSIGNMENT [, ASSIGNMENT] ...

Multiple-table syntax:

 UPDATE [LOW_PRIORITY] [IGNORE] TABLE_REFERENCES
     SET ASSIGNMENT_LIST
     [WHERE WHERE_CONDITION]

For the single-table syntax, the *note 'UPDATE': update. statement updates columns of existing rows in the named table with new values. The 'SET' clause indicates which columns to modify and the values they should be given. Each value can be given as an expression, or the keyword 'DEFAULT' to set a column explicitly to its default value. The 'WHERE' clause, if given, specifies the conditions that identify which rows to update. With no 'WHERE' clause, all rows are updated. If the 'ORDER BY' clause is specified, the rows are updated in the order that is specified. The 'LIMIT' clause places a limit on the number of rows that can be updated.

For the multiple-table syntax, *note 'UPDATE': update. updates rows in each table named in TABLE_REFERENCES that satisfy the conditions. Each matching row is updated once, even if it matches the conditions multiple times. For multiple-table syntax, 'ORDER BY' and 'LIMIT' cannot be used.

For partitioned tables, both the single-single and multiple-table forms of this statement support the use of a 'PARTITION' clause as part of a table reference. This option takes a list of one or more partitions or subpartitions (or both). Only the partitions (or subpartitions) listed are checked for matches, and a row that is not in any of these partitions or subpartitions is not updated, whether it satisfies the WHERE_CONDITION or not.

Note:

Unlike the case when using 'PARTITION' with an note 'INSERT': insert. or note 'REPLACE': replace. statement, an otherwise valid 'UPDATE ... PARTITION' statement is considered successful even if no rows in the listed partitions (or subpartitions) match the WHERE_CONDITION.

For more information and examples, see *note partitioning-selection::.

WHERE_CONDITION is an expression that evaluates to true for each row to be updated. For expression syntax, see *note expressions::.

TABLE_REFERENCES and WHERE_CONDITION are specified as described in *note select::.

You need the 'UPDATE' privilege only for columns referenced in an *note 'UPDATE': update. that are actually updated. You need only the 'SELECT' privilege for any columns that are read but not modified.

The *note 'UPDATE': update. statement supports the following modifiers:

With the 'LOW_PRIORITY' modifier, execution of the *note 'UPDATE': update. is delayed until no other clients are reading from the table. This affects only storage engines that use only table-level locking (such as 'MyISAM', 'MEMORY', and 'MERGE').
With the 'IGNORE' modifier, the update statement does not abort even if errors occur during the update. Rows for which duplicate-key conflicts occur on a unique key value are not updated. Rows updated to values that would cause data conversion errors are updated to the closest valid values instead. For more information, see *note ignore-effect-on-execution::.

note 'UPDATE IGNORE': update. statements, including those having an 'ORDER BY' clause, are flagged as unsafe for statement-based replication. (This is because the order in which the rows are updated determines which rows are ignored.) Such statements produce a warning in the error log when using statement-based mode and are written to the binary log using the row-based format when using 'MIXED' mode. (Bug #11758262, Bug #50439) See note replication-rbr-safe-unsafe::, for more information.

If you access a column from the table to be updated in an expression, *note 'UPDATE': update. uses the current value of the column. For example, the following statement sets 'col1' to one more than its current value:

 UPDATE t1 SET col1 = col1 + 1;

The second assignment in the following statement sets 'col2' to the current (updated) 'col1' value, not the original 'col1' value. The result is that 'col1' and 'col2' have the same value. This behavior differs from standard SQL.

 UPDATE t1 SET col1 = col1 + 1, col2 = col1;

Single-table *note 'UPDATE': update. assignments are generally evaluated from left to right. For multiple-table updates, there is no guarantee that assignments are carried out in any particular order.

If you set a column to the value it currently has, MySQL notices this and does not update it.

If you update a column that has been declared 'NOT NULL' by setting to 'NULL', an error occurs if strict SQL mode is enabled; otherwise, the column is set to the implicit default value for the column data type and the warning count is incremented. The implicit default value is '0' for numeric types, the empty string ('''') for string types, and the 'zero' value for date and time types. See *note data-type-defaults::.

If a generated column is updated explicitly, the only permitted value is 'DEFAULT'. For information about generated columns, see *note create-table-generated-columns::.

note 'UPDATE': update. returns the number of rows that were actually changed. The 'mysql_info()' (https://dev.mysql.com/doc/c-api/5.7/en/mysql-info.html) C API function returns the number of rows that were matched and updated and the number of warnings that occurred during the note 'UPDATE': update.

You can use 'LIMIT ROW_COUNT' to restrict the scope of the *note 'UPDATE': update. A 'LIMIT' clause is a rows-matched restriction. The statement stops as soon as it has found ROW_COUNT rows that satisfy the 'WHERE' clause, whether or not they actually were changed.

If an *note 'UPDATE': update. statement includes an 'ORDER BY' clause, the rows are updated in the order specified by the clause. This can be useful in certain situations that might otherwise result in an error. Suppose that a table 't' contains a column 'id' that has a unique index. The following statement could fail with a duplicate-key error, depending on the order in which rows are updated:

 UPDATE t SET id = id + 1;

For example, if the table contains 1 and 2 in the 'id' column and 1 is updated to 2 before 2 is updated to 3, an error occurs. To avoid this problem, add an 'ORDER BY' clause to cause the rows with larger 'id' values to be updated before those with smaller values:

 UPDATE t SET id = id + 1 ORDER BY id DESC;

You can also perform note 'UPDATE': update. operations covering multiple tables. However, you cannot use 'ORDER BY' or 'LIMIT' with a multiple-table note 'UPDATE': update. The TABLE_REFERENCES clause lists the tables involved in the join. Its syntax is described in *note join::. Here is an example:

 UPDATE items,month SET items.price=month.price
 WHERE items.id=month.id;

The preceding example shows an inner join that uses the comma operator, but multiple-table note 'UPDATE': update. statements can use any type of join permitted in note 'SELECT': select. statements, such as 'LEFT JOIN'.

If you use a multiple-table note 'UPDATE': update. statement involving 'InnoDB' tables for which there are foreign key constraints, the MySQL optimizer might process tables in an order that differs from that of their parent/child relationship. In this case, the statement fails and rolls back. Instead, update a single table and rely on the 'ON UPDATE' capabilities that 'InnoDB' provides to cause the other tables to be modified accordingly. See note create-table-foreign-keys::.

You cannot update a table and select directly from the same table in a subquery. You can work around this by using a multi-table update in which one of the tables is derived from the table that you actually wish to update, and referring to the derived table using an alias. Suppose you wish to update a table named 'items' which is defined using the statement shown here:

 CREATE TABLE items (
     id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
     wholesale DECIMAL(6,2) NOT NULL DEFAULT 0.00,
     retail DECIMAL(6,2) NOT NULL DEFAULT 0.00,
     quantity BIGINT NOT NULL DEFAULT 0
 );

To reduce the retail price of any items for which the markup is 30% or greater and of which you have fewer than one hundred in stock, you might try to use an 'UPDATE' statement such as the one following, which uses a subquery in the 'WHERE' clause. As shown here, this statement does not work:

 mysql> UPDATE items
      > SET retail = retail * 0.9
      > WHERE id IN
      >     (SELECT id FROM items
      >         WHERE retail / wholesale >= 1.3 AND quantity > 100);
 ERROR 1093 (HY000): You can't specify target table 'items' for update in FROM clause

Instead, you can employ a multi-table update in which the subquery is moved into the list of tables to be updated, using an alias to reference it in the outermost 'WHERE' clause, like this:

 UPDATE items,
        (SELECT id FROM items
         WHERE id IN
             (SELECT id FROM items
              WHERE retail / wholesale >= 1.3 AND quantity < 100))
         AS discounted
 SET items.retail = items.retail * 0.9
 WHERE items.id = discounted.id;

Because the optimizer tries by default to merge the derived table 'discounted' into the outermost query block, this works only if you force materialization of the derived table. You can do this by setting the 'derived_merge' flag of the 'optimizer_switch' system variable to 'off' before running the update, or by using the 'NO_MERGE' optimizer hint, as shown here:

 UPDATE /*+ NO_MERGE(discounted) */ items,
        (SELECT id FROM items
         WHERE retail / wholesale >= 1.3 AND quantity < 100)
         AS discounted
     SET items.retail = items.retail * 0.9
     WHERE items.id = discounted.id;

The advantage of using the optimizer hint in such a case is that it applies only within the query block where it is used, so that it is not necessary to change the value of 'optimizer_switch' again after executing the 'UPDATE'.

Another possibility is to rewrite the subquery so that it does not use 'IN' or 'EXISTS', like this:

 UPDATE items,
        (SELECT id, retail / wholesale AS markup, quantity FROM items)
        AS discounted
     SET items.retail = items.retail * 0.9
     WHERE discounted.markup >= 1.3
     AND discounted.quantity < 100
     AND items.id = discounted.id;

In this case, the subquery is materialized by default rather than merged, so it is not necessary to disable merging of the derived table.

File: manual.info.tmp, Node: sql-transactional-statements, Next: sql-replication-statements, Prev: sql-data-manipulation-statements, Up: sql-statements