Menu:
string-functions-charset:: Character Set and Collation of Function Results
String Functions and Operators
Name Description
'ASCII()' Return numeric value of left-most character
'BIN()' Return a string containing binary representation of a number
'BIT_LENGTH()' Return length of argument in bits
'CHAR()' Return the character for each integer passed
'CHAR_LENGTH()' Return number of characters in argument
'CHARACTER_LENGTH()' Synonym for CHAR_LENGTH()
'CONCAT()' Return concatenated string
'CONCAT_WS()' Return concatenate with separator
'ELT()' Return string at index number
'EXPORT_SET()' Return a string such that for every bit set in the value bits, you get an on string and for every unset bit, you get an off string
'FIELD()' Index (position) of first argument in subsequent arguments
'FIND_IN_SET()' Index (position) of first argument within second argument
'FORMAT()' Return a number formatted to specified number of decimal places
'FROM_BASE64()' Decode base64 encoded string and return result
'HEX()' Hexadecimal representation of decimal or string value
'INSERT()' Insert substring at specified position up to specified number of characters
'INSTR()' Return the index of the first occurrence of substring
'LCASE()' Synonym for LOWER()
'LEFT()' Return the leftmost number of characters as specified
'LENGTH()' Return the length of a string in bytes
'LIKE' Simple pattern matching
'LOAD_FILE()' Load the named file
'LOCATE()' Return the position of the first occurrence of substring
'LOWER()' Return the argument in lowercase
'LPAD()' Return the string argument, left-padded with the specified string
'LTRIM()' Remove leading spaces
'MAKE_SET()' Return a set of comma-separated strings that have the corresponding bit in bits set
'MATCH()' Perform full-text search
'MID()' Return a substring starting from the specified position
'NOT LIKE' Negation of simple pattern matching
'NOT REGEXP' Negation of REGEXP
'OCT()' Return a string containing octal representation of a number
'OCTET_LENGTH()' Synonym for LENGTH()
'ORD()' Return character code for leftmost character of the argument
'POSITION()' Synonym for LOCATE()
'QUOTE()' Escape the argument for use in an SQL statement
'REGEXP' Whether string matches regular expression
'REPEAT()' Repeat a string the specified number of times
'REPLACE()' Replace occurrences of a specified string
'REVERSE()' Reverse the characters in a string
'RIGHT()' Return the specified rightmost number of characters
'RLIKE' Whether string matches regular expression
'RPAD()' Append string the specified number of times
'RTRIM()' Remove trailing spaces
'SOUNDEX()' Return a soundex string
'SOUNDS LIKE' Compare sounds
'SPACE()' Return a string of the specified number of spaces
'STRCMP()' Compare two strings
'SUBSTR()' Return the substring as specified
'SUBSTRING()' Return the substring as specified
'SUBSTRING_INDEX()' Return a substring from a string before the specified number of occurrences of the delimiter
'TO_BASE64()' Return the argument converted to a base-64 string
'TRIM()' Remove leading and trailing spaces
'UCASE()' Synonym for UPPER()
'UNHEX()' Return a string containing hex representation of a number
'UPPER()' Convert to uppercase
'WEIGHT_STRING()' Return the weight string for a string
String-valued functions return 'NULL' if the length of the result would be greater than the value of the 'max_allowed_packet' system variable. See *note server-configuration::.
For functions that operate on string positions, the first position is numbered 1.
For functions that take length arguments, noninteger arguments are rounded to the nearest integer.
'ASCII(STR)'
Returns the numeric value of the leftmost character of the string STR. Returns '0' if STR is the empty string. Returns 'NULL' if STR is 'NULL'. 'ASCII()' works for 8-bit characters.
mysql> SELECT ASCII('2');
-> 50
mysql> SELECT ASCII(2);
-> 50
mysql> SELECT ASCII('dx');
-> 100
See also the 'ORD()' function.
'BIN(N)'
Returns a string representation of the binary value of N, where N is a longlong (*note 'BIGINT': integer-types.) number. This is equivalent to 'CONV(N,10,2)'. Returns 'NULL' if N is 'NULL'.
mysql> SELECT BIN(12);
-> '1100'
'BIT_LENGTH(STR)'
Returns the length of the string STR in bits.
mysql> SELECT BIT_LENGTH('text');
-> 32
'CHAR(N,... [USING CHARSET_NAME])'
'CHAR()' interprets each argument N as an integer and returns a string consisting of the characters given by the code values of those integers. 'NULL' values are skipped.
mysql> SELECT CHAR(77,121,83,81,'76');
-> 'MySQL'
mysql> SELECT CHAR(77,77.3,'77.3');
-> 'MMM'
'CHAR()' arguments larger than 255 are converted into multiple result bytes. For example, 'CHAR(256)' is equivalent to 'CHAR(1,0)', and 'CHAR(256*256)' is equivalent to 'CHAR(1,0,0)':
mysql> SELECT HEX(CHAR(1,0)), HEX(CHAR(256));
+----------------+----------------+
| HEX(CHAR(1,0)) | HEX(CHAR(256)) |
+----------------+----------------+
| 0100 | 0100 |
+----------------+----------------+
mysql> SELECT HEX(CHAR(1,0,0)), HEX(CHAR(256*256));
+------------------+--------------------+
| HEX(CHAR(1,0,0)) | HEX(CHAR(256*256)) |
+------------------+--------------------+
| 010000 | 010000 |
+------------------+--------------------+
By default, 'CHAR()' returns a binary string. To produce a string in a given character set, use the optional 'USING' clause:
mysql> SELECT CHARSET(CHAR(X'65')), CHARSET(CHAR(X'65' USING utf8));
+----------------------+---------------------------------+
| CHARSET(CHAR(X'65')) | CHARSET(CHAR(X'65' USING utf8)) |
+----------------------+---------------------------------+
| binary | utf8 |
+----------------------+---------------------------------+
If 'USING' is given and the result string is illegal for the given character set, a warning is issued. Also, if strict SQL mode is enabled, the result from 'CHAR()' becomes 'NULL'.
If 'CHAR()' is invoked from within the note 'mysql': mysql. client, binary strings display using hexadecimal notation, depending on the value of the '--binary-as-hex'. For more information about that option, see note mysql::.
'CHAR_LENGTH(STR)'
Returns the length of the string STR, measured in code points. A multibyte character counts as a single code point. This means that, for a string containing two 3-byte characters, 'LENGTH()' returns '6', whereas 'CHAR_LENGTH()' returns '2', as shown here:
mysql> SET @dolphin:='海豚';
Query OK, 0 rows affected (0.01 sec)
mysql> SELECT LENGTH(@dolphin), CHAR_LENGTH(@dolphin);
+------------------+-----------------------+
| LENGTH(@dolphin) | CHAR_LENGTH(@dolphin) |
+------------------+-----------------------+
| 6 | 2 |
+------------------+-----------------------+
1 row in set (0.00 sec)
'CHARACTER_LENGTH(STR)'
'CHARACTER_LENGTH()' is a synonym for 'CHAR_LENGTH()'.
'CONCAT(STR1,STR2,...)'
Returns the string that results from concatenating the arguments. May have one or more arguments. If all arguments are nonbinary strings, the result is a nonbinary string. If the arguments include any binary strings, the result is a binary string. A numeric argument is converted to its equivalent nonbinary string form.
'CONCAT()' returns 'NULL' if any argument is 'NULL'.
mysql> SELECT CONCAT('My', 'S', 'QL');
-> 'MySQL'
mysql> SELECT CONCAT('My', NULL, 'QL');
-> NULL
mysql> SELECT CONCAT(14.3);
-> '14.3'
For quoted strings, concatenation can be performed by placing the strings next to each other:
mysql> SELECT 'My' 'S' 'QL';
-> 'MySQL'
If 'CONCAT()' is invoked from within the note 'mysql': mysql. client, binary string results display using hexadecimal notation, depending on the value of the '--binary-as-hex'. For more information about that option, see note mysql::.
'CONCAT_WS(SEPARATOR,STR1,STR2,...)'
'CONCAT_WS()' stands for Concatenate With Separator and is a special form of 'CONCAT()'. The first argument is the separator for the rest of the arguments. The separator is added between the strings to be concatenated. The separator can be a string, as can the rest of the arguments. If the separator is 'NULL', the result is 'NULL'.
mysql> SELECT CONCAT_WS(',', 'First name', 'Second name', 'Last Name');
-> 'First name,Second name,Last Name'
mysql> SELECT CONCAT_WS(',', 'First name', NULL, 'Last Name');
-> 'First name,Last Name'
'CONCAT_WS()' does not skip empty strings. However, it does skip any 'NULL' values after the separator argument.
'ELT(N,STR1,STR2,STR3,...)'
'ELT()' returns the Nth element of the list of strings: STR1 if N = '1', STR2 if N = '2', and so on. Returns 'NULL' if N is less than '1' or greater than the number of arguments. 'ELT()' is the complement of 'FIELD()'.
mysql> SELECT ELT(1, 'Aa', 'Bb', 'Cc', 'Dd');
-> 'Aa'
mysql> SELECT ELT(4, 'Aa', 'Bb', 'Cc', 'Dd');
-> 'Dd'
'EXPORT_SET(BITS,ON,OFF[,SEPARATOR[,NUMBER_OF_BITS]])'
Returns a string such that for every bit set in the value BITS, you get an ON string and for every bit not set in the value, you get an OFF string. Bits in BITS are examined from right to left (from low-order to high-order bits). Strings are added to the result from left to right, separated by the SEPARATOR string (the default being the comma character ','). The number of bits examined is given by NUMBER_OF_BITS, which has a default of 64 if not specified. NUMBER_OF_BITS is silently clipped to 64 if larger thanIt is treated as an unsigned integer, so a value of −1 is effectively the same as 64.
mysql> SELECT EXPORT_SET(5,'Y','N',',',4); -> 'Y,N,Y,N' mysql> SELECT EXPORT_SET(6,'1','0',',',10); -> '0,1,1,0,0,0,0,0,0,0'
'FIELD(STR,STR1,STR2,STR3,...)'
Returns the index (position) of STR in the STR1, STR2, STR3, '...' list. Returns '0' if STR is not found.
If all arguments to 'FIELD()' are strings, all arguments are compared as strings. If all arguments are numbers, they are compared as numbers. Otherwise, the arguments are compared as double.
If STR is 'NULL', the return value is '0' because 'NULL' fails equality comparison with any value. 'FIELD()' is the complement of 'ELT()'.
mysql> SELECT FIELD('Bb', 'Aa', 'Bb', 'Cc', 'Dd', 'Ff');
-> 2
mysql> SELECT FIELD('Gg', 'Aa', 'Bb', 'Cc', 'Dd', 'Ff');
-> 0
'FIND_IN_SET(STR,STRLIST)'
Returns a value in the range of 1 to N if the string STR is in the string list STRLIST consisting of N substrings. A string list is a string composed of substrings separated by ',' characters. If the first argument is a constant string and the second is a column of type *note 'SET': set, the 'FIND_IN_SET()' function is optimized to use bit arithmetic. Returns '0' if STR is not in STRLIST or if STRLIST is the empty string. Returns 'NULL' if either argument is 'NULL'. This function does not work properly if the first argument contains a comma (',') character.
mysql> SELECT FIND_IN_SET('b','a,b,c,d');
-> 2
'FORMAT(X,D[,LOCALE])'
Formats the number X to a format like ''#,###,###.##'', rounded to D decimal places, and returns the result as a string. If D is '0', the result has no decimal point or fractional part.
The optional third parameter enables a locale to be specified to be used for the result number's decimal point, thousands separator, and grouping between separators. Permissible locale values are the same as the legal values for the 'lc_time_names' system variable (see *note locale-support::). If no locale is specified, the default is ''en_US''.
mysql> SELECT FORMAT(12332.123456, 4);
-> '12,332.1235'
mysql> SELECT FORMAT(12332.1,4);
-> '12,332.1000'
mysql> SELECT FORMAT(12332.2,0);
-> '12,332'
mysql> SELECT FORMAT(12332.2,2,'de_DE');
-> '12.332,20'
'FROM_BASE64(STR)'
Takes a string encoded with the base-64 encoded rules used by 'TO_BASE64()' and returns the decoded result as a binary string. The result is 'NULL' if the argument is 'NULL' or not a valid base-64 string. See the description of 'TO_BASE64()' for details about the encoding and decoding rules.
mysql> SELECT TO_BASE64('abc'), FROM_BASE64(TO_BASE64('abc'));
-> 'JWJj', 'abc'
If 'FROM_BASE64()' is invoked from within the note 'mysql': mysql. client, binary strings display using hexadecimal notation, depending on the value of the '--binary-as-hex'. For more information about that option, see note mysql::.
'HEX(STR)', 'HEX(N)'
For a string argument STR, 'HEX()' returns a hexadecimal string representation of STR where each byte of each character in STR is converted to two hexadecimal digits. (Multibyte characters therefore become more than two digits.) The inverse of this operation is performed by the 'UNHEX()' function.
For a numeric argument N, 'HEX()' returns a hexadecimal string representation of the value of N treated as a longlong (*note 'BIGINT': integer-types.) number. This is equivalent to 'CONV(N,10,16)'. The inverse of this operation is performed by 'CONV(HEX(N),16,10)'.
mysql> SELECT X'616263', HEX('abc'), UNHEX(HEX('abc'));
-> 'abc', 616263, 'abc'
mysql> SELECT HEX(255), CONV(HEX(255),16,10);
-> 'FF', 255
'INSERT(STR,POS,LEN,NEWSTR)'
Returns the string STR, with the substring beginning at position POS and LEN characters long replaced by the string NEWSTR. Returns the original string if POS is not within the length of the string. Replaces the rest of the string from position POS if LEN is not within the length of the rest of the string. Returns 'NULL' if any argument is 'NULL'.
mysql> SELECT INSERT('Quadratic', 3, 4, 'What');
-> 'QuWhattic'
mysql> SELECT INSERT('Quadratic', -1, 4, 'What');
-> 'Quadratic'
mysql> SELECT INSERT('Quadratic', 3, 100, 'What');
-> 'QuWhat'
This function is multibyte safe.
'INSTR(STR,SUBSTR)'
Returns the position of the first occurrence of substring SUBSTR in string STR. This is the same as the two-argument form of 'LOCATE()', except that the order of the arguments is reversed.
mysql> SELECT INSTR('foobarbar', 'bar');
-> 4
mysql> SELECT INSTR('xbar', 'foobar');
-> 0
This function is multibyte safe, and is case-sensitive only if at least one argument is a binary string.
'LCASE(STR)'
'LCASE()' is a synonym for 'LOWER()'.
'LCASE()' used in a view is rewritten as 'LOWER()' when storing the view's definition. (Bug #12844279)
'LEFT(STR,LEN)'
Returns the leftmost LEN characters from the string STR, or 'NULL' if any argument is 'NULL'.
mysql> SELECT LEFT('foobarbar', 5);
-> 'fooba'
This function is multibyte safe.
'LENGTH(STR)'
Returns the length of the string STR, measured in bytes. A multibyte character counts as multiple bytes. This means that for a string containing five 2-byte characters, 'LENGTH()' returns '10', whereas 'CHAR_LENGTH()' returns '5'.
mysql> SELECT LENGTH('text');
-> 4
Note:
The 'Length()' OpenGIS spatial function is named 'ST_Length()' in MySQL.
'LOAD_FILE(FILE_NAME)'
Reads the file and returns the file contents as a string. To use this function, the file must be located on the server host, you must specify the full path name to the file, and you must have the 'FILE' privilege. The file must be readable by all and its size less than 'max_allowed_packet' bytes. If the 'secure_file_priv' system variable is set to a nonempty directory name, the file to be loaded must be located in that directory.
If the file does not exist or cannot be read because one of the preceding conditions is not satisfied, the function returns 'NULL'.
The 'character_set_filesystem' system variable controls interpretation of file names that are given as literal strings.
mysql> UPDATE t
SET blob_col=LOAD_FILE('/tmp/picture')
WHERE id=1;
'LOCATE(SUBSTR,STR)', 'LOCATE(SUBSTR,STR,POS)'
The first syntax returns the position of the first occurrence of substring SUBSTR in string STR. The second syntax returns the position of the first occurrence of substring SUBSTR in string STR, starting at position POS. Returns '0' if SUBSTR is not in STR. Returns 'NULL' if SUBSTR or STR is 'NULL'.
mysql> SELECT LOCATE('bar', 'foobarbar');
-> 4
mysql> SELECT LOCATE('xbar', 'foobar');
-> 0
mysql> SELECT LOCATE('bar', 'foobarbar', 5);
-> 7
This function is multibyte safe, and is case-sensitive only if at least one argument is a binary string.
'LOWER(STR)'
Returns the string STR with all characters changed to lowercase according to the current character set mapping. The default is 'latin1' (cp1252 West European).
mysql> SELECT LOWER('QUADRATICALLY');
-> 'quadratically'
'LOWER()' (and 'UPPER()') are ineffective when applied to binary strings (note 'BINARY': binary-varbinary, note 'VARBINARY': binary-varbinary, *note 'BLOB': blob.). To perform lettercase conversion of a binary string, first convert it to a nonbinary string using a character set appropriate for the data stored in the string:
mysql> SET @str = BINARY 'New York';
mysql> SELECT LOWER(@str), LOWER(CONVERT(@str USING latin1));
+-------------+-----------------------------------+
| LOWER(@str) | LOWER(CONVERT(@str USING latin1)) |
+-------------+-----------------------------------+
| New York | new york |
+-------------+-----------------------------------+
For collations of Unicode character sets, 'LOWER()' and 'UPPER()' work according to the Unicode Collation Algorithm (UCA) version in the collation name, if there is one, and UCA 4.0.0 if no version is specified. For example, 'utf8_unicode_520_ci' works according to UCA 5.2.0, whereas 'utf8_unicode_ci' works according to UCA 4.0.0. See *note charset-unicode-sets::.
This function is multibyte safe.
In previous versions of MySQL, 'LOWER()' used within a view was rewritten as 'LCASE()' when storing the view's definition. In MySQL 5.7, 'LOWER()' is never rewritten in such cases, but 'LCASE()' used within views is instead rewritten as 'LOWER()'. (Bug #12844279)
'LPAD(STR,LEN,PADSTR)'
Returns the string STR, left-padded with the string PADSTR to a length of LEN characters. If STR is longer than LEN, the return value is shortened to LEN characters.
mysql> SELECT LPAD('hi',4,'??');
-> '??hi'
mysql> SELECT LPAD('hi',1,'??');
-> 'h'
'LTRIM(STR)'
Returns the string STR with leading space characters removed.
mysql> SELECT LTRIM(' barbar');
-> 'barbar'
This function is multibyte safe.
'MAKE_SET(BITS,STR1,STR2,...)'
Returns a set value (a string containing substrings separated by ',' characters) consisting of the strings that have the corresponding bit in BITS set. STR1 corresponds to bit 0, STR2 to bit 1, and so on. 'NULL' values in STR1, STR2, '...' are not appended to the result.
mysql> SELECT MAKE_SET(1,'a','b','c');
-> 'a'
mysql> SELECT MAKE_SET(1 | 4,'hello','nice','world');
-> 'hello,world'
mysql> SELECT MAKE_SET(1 | 4,'hello','nice',NULL,'world');
-> 'hello'
mysql> SELECT MAKE_SET(0,'a','b','c');
-> ''
'MID(STR,POS)', 'MID(STR FROM POS)', 'MID(STR,POS,LEN)', 'MID(STR FROM POS FOR LEN)'
'MID(STR,POS,LEN)' is a synonym for 'SUBSTRING(STR,POS,LEN)'.
'OCT(N)'
Returns a string representation of the octal value of N, where N is a longlong (*note 'BIGINT': integer-types.) number. This is equivalent to 'CONV(N,10,8)'. Returns 'NULL' if N is 'NULL'.
mysql> SELECT OCT(12);
-> '14'
'OCTET_LENGTH(STR)'
'OCTET_LENGTH()' is a synonym for 'LENGTH()'.
'ORD(STR)'
If the leftmost character of the string STR is a multibyte character, returns the code for that character, calculated from the numeric values of its constituent bytes using this formula:
(1st byte code)
+ (2nd byte code * 256)
+ (3rd byte code * 256^2) ...
If the leftmost character is not a multibyte character, 'ORD()' returns the same value as the 'ASCII()' function.
mysql> SELECT ORD('2');
-> 50
'POSITION(SUBSTR IN STR)'
'POSITION(SUBSTR IN STR)' is a synonym for 'LOCATE(SUBSTR,STR)'.
'QUOTE(STR)'
Quotes a string to produce a result that can be used as a properly escaped data value in an SQL statement. The string is returned enclosed by single quotation marks and with each instance of backslash (''), single quote ('''), ASCII 'NUL', and Control+Z preceded by a backslash. If the argument is 'NULL', the return value is the word 'NULL' without enclosing single quotation marks.
mysql> SELECT QUOTE('Don\'t!');
-> 'Don\'t!'
mysql> SELECT QUOTE(NULL);
-> NULL
For comparison, see the quoting rules for literal strings and within the C API in *note string-literals::, and mysql_real_escape_string_quote() (https://dev.mysql.com/doc/c-api/5.7/en/mysql-real-escape-string-quote.html).
'REPEAT(STR,COUNT)'
Returns a string consisting of the string STR repeated COUNT times. If COUNT is less than 1, returns an empty string. Returns 'NULL' if STR or COUNT are 'NULL'.
mysql> SELECT REPEAT('MySQL', 3);
-> 'MySQLMySQLMySQL'
'REPLACE(STR,FROM_STR,TO_STR)'
Returns the string STR with all occurrences of the string FROM_STR replaced by the string TO_STR. 'REPLACE()' performs a case-sensitive match when searching for FROM_STR.
mysql> SELECT REPLACE('www.mysql.com', 'w', 'Ww');
-> 'WwWwWw.mysql.com'
This function is multibyte safe.
'REVERSE(STR)'
Returns the string STR with the order of the characters reversed.
mysql> SELECT REVERSE('abc');
-> 'cba'
This function is multibyte safe.
'RIGHT(STR,LEN)'
Returns the rightmost LEN characters from the string STR, or 'NULL' if any argument is 'NULL'.
mysql> SELECT RIGHT('foobarbar', 4);
-> 'rbar'
This function is multibyte safe.
'RPAD(STR,LEN,PADSTR)'
Returns the string STR, right-padded with the string PADSTR to a length of LEN characters. If STR is longer than LEN, the return value is shortened to LEN characters.
mysql> SELECT RPAD('hi',5,'?');
-> 'hi???'
mysql> SELECT RPAD('hi',1,'?');
-> 'h'
This function is multibyte safe.
'RTRIM(STR)'
Returns the string STR with trailing space characters removed.
mysql> SELECT RTRIM('barbar ');
-> 'barbar'
This function is multibyte safe.
'SOUNDEX(STR)'
Returns a soundex string from STR. Two strings that sound almost the same should have identical soundex strings. A standard soundex string is four characters long, but the 'SOUNDEX()' function returns an arbitrarily long string. You can use 'SUBSTRING()' on the result to get a standard soundex string. All nonalphabetic characters in STR are ignored. All international alphabetic characters outside the A-Z range are treated as vowels.
Important:
When using 'SOUNDEX()', you should be aware of the following limitations:
* This function, as currently implemented, is intended to work
well with strings that are in the English language only.
Strings in other languages may not produce reliable results.
* This function is not guaranteed to provide consistent results
with strings that use multibyte character sets, including
'utf-8'. See Bug #22638 for more information.
mysql> SELECT SOUNDEX('Hello');
-> 'H400'
mysql> SELECT SOUNDEX('Quadratically');
-> 'Q36324'
Note:
This function implements the original Soundex algorithm, not the more popular enhanced version (also described by D. Knuth). The difference is that original version discards vowels first and duplicates second, whereas the enhanced version discards duplicates first and vowels second.
'EXPR1 SOUNDS LIKE EXPR2'
This is the same as 'SOUNDEX(EXPR1) = SOUNDEX(EXPR2)'.
'SPACE(N)'
Returns a string consisting of N space characters.
mysql> SELECT SPACE(6);
-> ' '
'SUBSTR(STR,POS)', 'SUBSTR(STR FROM POS)', 'SUBSTR(STR,POS,LEN)', 'SUBSTR(STR FROM POS FOR LEN)'
'SUBSTR()' is a synonym for 'SUBSTRING()'.
'SUBSTRING(STR,POS)', 'SUBSTRING(STR FROM POS)', 'SUBSTRING(STR,POS,LEN)', 'SUBSTRING(STR FROM POS FOR LEN)'
The forms without a LEN argument return a substring from string STR starting at position POS. The forms with a LEN argument return a substring LEN characters long from string STR, starting at position POS. The forms that use 'FROM' are standard SQL syntax. It is also possible to use a negative value for POS. In this case, the beginning of the substring is POS characters from the end of the string, rather than the beginning. A negative value may be used for POS in any of the forms of this function. A value of 0 for POS returns an empty string.
For all forms of 'SUBSTRING()', the position of the first character in the string from which the substring is to be extracted is reckoned as '1'.
mysql> SELECT SUBSTRING('Quadratically',5);
-> 'ratically'
mysql> SELECT SUBSTRING('foobarbar' FROM 4);
-> 'barbar'
mysql> SELECT SUBSTRING('Quadratically',5,6);
-> 'ratica'
mysql> SELECT SUBSTRING('Sakila', -3);
-> 'ila'
mysql> SELECT SUBSTRING('Sakila', -5, 3);
-> 'aki'
mysql> SELECT SUBSTRING('Sakila' FROM -4 FOR 2);
-> 'ki'
This function is multibyte safe.
If LEN is less than 1, the result is the empty string.
'SUBSTRING_INDEX(STR,DELIM,COUNT)'
Returns the substring from string STR before COUNT occurrences of the delimiter DELIM. If COUNT is positive, everything to the left of the final delimiter (counting from the left) is returned. If COUNT is negative, everything to the right of the final delimiter (counting from the right) is returned. 'SUBSTRING_INDEX()' performs a case-sensitive match when searching for DELIM.
mysql> SELECT SUBSTRING_INDEX('www.mysql.com', '.', 2);
-> 'www.mysql'
mysql> SELECT SUBSTRING_INDEX('www.mysql.com', '.', -2);
-> 'mysql.com'
This function is multibyte safe.
'TO_BASE64(STR)'
Converts the string argument to base-64 encoded form and returns the result as a character string with the connection character set and collation. If the argument is not a string, it is converted to a string before conversion takes place. The result is 'NULL' if the argument is 'NULL'. Base-64 encoded strings can be decoded using the 'FROM_BASE64()' function.
mysql> SELECT TO_BASE64('abc'), FROM_BASE64(TO_BASE64('abc'));
-> 'JWJj', 'abc'
Different base-64 encoding schemes exist. These are the encoding and decoding rules used by 'TO_BASE64()' and 'FROM_BASE64()':
* The encoding for alphabet value 62 is ''+''.
* The encoding for alphabet value 63 is ''/''.
* Encoded output consists of groups of 4 printable characters.
Each 3 bytes of the input data are encoded using 4 characters.
If the last group is incomplete, it is padded with ''=''
characters to a length of 4.
* A newline is added after each 76 characters of encoded output
to divide long output into multiple lines.
* Decoding recognizes and ignores newline, carriage return, tab,
and space.
'TRIM([{BOTH | LEADING | TRAILING} [REMSTR] FROM] STR)', 'TRIM([REMSTR FROM] STR)'
Returns the string STR with all REMSTR prefixes or suffixes removed. If none of the specifiers 'BOTH', 'LEADING', or 'TRAILING' is given, 'BOTH' is assumed. REMSTR is optional and, if not specified, spaces are removed.
mysql> SELECT TRIM(' bar ');
-> 'bar'
mysql> SELECT TRIM(LEADING 'x' FROM 'xxxbarxxx');
-> 'barxxx'
mysql> SELECT TRIM(BOTH 'x' FROM 'xxxbarxxx');
-> 'bar'
mysql> SELECT TRIM(TRAILING 'xyz' FROM 'barxxyz');
-> 'barx'
This function is multibyte safe.
'UCASE(STR)'
'UCASE()' is a synonym for 'UPPER()'.
In MySQL 5.7, 'UCASE()' used in a view is rewritten as 'UPPER()' when storing the view's definition. (Bug #12844279)
'UNHEX(STR)'
For a string argument STR, 'UNHEX(STR)' interprets each pair of characters in the argument as a hexadecimal number and converts it to the byte represented by the number. The return value is a binary string.
mysql> SELECT UNHEX('4D7953514C');
-> 'MySQL'
mysql> SELECT X'4D7953514C';
-> 'MySQL'
mysql> SELECT UNHEX(HEX('string'));
-> 'string'
mysql> SELECT HEX(UNHEX('1267'));
-> '1267'
The characters in the argument string must be legal hexadecimal digits: ''0'' .. ''9'', ''A'' .. ''F'', ''a'' .. ''f''. If the argument contains any nonhexadecimal digits, the result is 'NULL':
mysql> SELECT UNHEX('GG');
+-------------+
| UNHEX('GG') |
+-------------+
| NULL |
+-------------+
A 'NULL' result can occur if the argument to 'UNHEX()' is a *note 'BINARY': binary-varbinary. column, because values are padded with '0x00' bytes when stored but those bytes are not stripped on retrieval. For example, ''41'' is stored into a 'CHAR(3)' column as ''41 '' and retrieved as ''41'' (with the trailing pad space stripped), so 'UNHEX()' for the column value returns 'X'41''. By contrast, ''41'' is stored into a 'BINARY(3)' column as ''41'' and retrieved as ''41'' (with the trailing pad '0x00' byte not stripped). '''' is not a legal hexadecimal digit, so 'UNHEX()' for the column value returns 'NULL'.
For a numeric argument N, the inverse of 'HEX(N)' is not performed by 'UNHEX()'. Use 'CONV(HEX(N),16,10)' instead. See the description of 'HEX()'.
If 'UNHEX()' is invoked from within the note 'mysql': mysql. client, binary strings display using hexadecimal notation, depending on the value of the '--binary-as-hex'. For more information about that option, see note mysql::.
'UPPER(STR)'
Returns the string STR with all characters changed to uppercase according to the current character set mapping. The default is 'latin1' (cp1252 West European).
mysql> SELECT UPPER('Hej');
-> 'HEJ'
See the description of 'LOWER()' for information that also applies to 'UPPER()'. This included information about how to perform lettercase conversion of binary strings (note 'BINARY': binary-varbinary, note 'VARBINARY': binary-varbinary, *note 'BLOB': blob.) for which these functions are ineffective, and information about case folding for Unicode character sets.
This function is multibyte safe.
In previous versions of MySQL, 'UPPER()' used within a view was rewritten as 'UCASE()' when storing the view's definition. In MySQL 5.7, 'UPPER()' is never rewritten in such cases, but 'UCASE()' used within views is instead rewritten as 'UPPER()'. (Bug #12844279)
'WEIGHT_STRING(STR [AS {CHAR|BINARY}(N)] [LEVEL LEVELS] [FLAGS])'
'LEVELS: N [ASC|DESC|REVERSE] [, N [ASC|DESC|REVERSE]] ...'
This function returns the weight string for the input string. The return value is a binary string that represents the comparison and sorting value of the string. It has these properties:
* If 'WEIGHT_STRING(STR1)' = 'WEIGHT_STRING(STR2)', then 'STR1 =
STR2' (STR1 and STR2 are considered equal)
* If 'WEIGHT_STRING(STR1)' < 'WEIGHT_STRING(STR2)', then 'STR1 <
STR2' (STR1 sorts before STR2)
'WEIGHT_STRING()' is a debugging function intended for internal use. Its behavior can change without notice between MySQL versions. It can be used for testing and debugging of collations, especially if you are adding a new collation. See *note adding-collation::.
This list briefly summarizes the arguments. More details are given in the discussion following the list.
* STR: The input string expression.
* 'AS' clause: Optional; cast the input string to a given type
and length.
* 'LEVEL' clause: Optional; specify weight levels for the return
value.
* FLAGS: Optional; unused.
The input string, STR, is a string expression. If the input is a nonbinary (character) string such as a note 'CHAR': char, note 'VARCHAR': char, or note 'TEXT': blob. value, the return value contains the collation weights for the string. If the input is a binary (byte) string such as a note 'BINARY': binary-varbinary, note 'VARBINARY': binary-varbinary, or note 'BLOB': blob. value, the return value is the same as the input (the weight for each byte in a binary string is the byte value). If the input is 'NULL', 'WEIGHT_STRING()' returns 'NULL'.
Examples:
mysql> SET @s = _latin1 'AB' COLLATE latin1_swedish_ci;
mysql> SELECT @s, HEX(@s), HEX(WEIGHT_STRING(@s));
+------+---------+------------------------+
| @s | HEX(@s) | HEX(WEIGHT_STRING(@s)) |
+------+---------+------------------------+
| AB | 4142 | 4142 |
+------+---------+------------------------+
mysql> SET @s = _latin1 'ab' COLLATE latin1_swedish_ci;
mysql> SELECT @s, HEX(@s), HEX(WEIGHT_STRING(@s));
+------+---------+------------------------+
| @s | HEX(@s) | HEX(WEIGHT_STRING(@s)) |
+------+---------+------------------------+
| ab | 6162 | 4142 |
+------+---------+------------------------+
mysql> SET @s = CAST('AB' AS BINARY);
mysql> SELECT @s, HEX(@s), HEX(WEIGHT_STRING(@s));
+------+---------+------------------------+
| @s | HEX(@s) | HEX(WEIGHT_STRING(@s)) |
+------+---------+------------------------+
| AB | 4142 | 4142 |
+------+---------+------------------------+
mysql> SET @s = CAST('ab' AS BINARY);
mysql> SELECT @s, HEX(@s), HEX(WEIGHT_STRING(@s));
+------+---------+------------------------+
| @s | HEX(@s) | HEX(WEIGHT_STRING(@s)) |
+------+---------+------------------------+
| ab | 6162 | 6162 |
+------+---------+------------------------+
The preceding examples use 'HEX()' to display the 'WEIGHT_STRING()' result. Because the result is a binary value, 'HEX()' can be especially useful when the result contains nonprinting values, to display it in printable form:
mysql> SET @s = CONVERT(X'C39F' USING utf8) COLLATE utf8_czech_ci;
mysql> SELECT HEX(WEIGHT_STRING(@s));
+------------------------+
| HEX(WEIGHT_STRING(@s)) |
+------------------------+
| 0FEA0FEA |
+------------------------+
For non-'NULL' return values, the data type of the value is note 'VARBINARY': binary-varbinary. if its length is within the maximum length for note 'VARBINARY': binary-varbinary, otherwise the data type is *note 'BLOB': blob.
The 'AS' clause may be given to cast the input string to a nonbinary or binary string and to force it to a given length:
* 'AS CHAR(N)' casts the string to a nonbinary string and pads
it on the right with spaces to a length of N characters. N
must be at least 1. If N is less than the length of the input
string, the string is truncated to N characters. No warning
occurs for truncation.
* 'AS BINARY(N)' is similar but casts the string to a binary
string, N is measured in bytes (not characters), and padding
uses '0x00' bytes (not spaces).
mysql> SET NAMES 'latin1';
mysql> SELECT HEX(WEIGHT_STRING('ab' AS CHAR(4)));
+-------------------------------------+
| HEX(WEIGHT_STRING('ab' AS CHAR(4))) |
+-------------------------------------+
| 41422020 |
+-------------------------------------+
mysql> SET NAMES 'utf8';
mysql> SELECT HEX(WEIGHT_STRING('ab' AS CHAR(4)));
+-------------------------------------+
| HEX(WEIGHT_STRING('ab' AS CHAR(4))) |
+-------------------------------------+
| 0041004200200020 |
+-------------------------------------+
mysql> SELECT HEX(WEIGHT_STRING('ab' AS BINARY(4)));
+---------------------------------------+
| HEX(WEIGHT_STRING('ab' AS BINARY(4))) |
+---------------------------------------+
| 61620000 |
+---------------------------------------+
The 'LEVEL' clause may be given to specify that the return value should contain weights for specific collation levels.
The LEVELS specifier following the 'LEVEL' keyword may be given either as a list of one or more integers separated by commas, or as a range of two integers separated by a dash. Whitespace around the punctuation characters does not matter.
Examples:
LEVEL 1
LEVEL 2, 3, 5
LEVEL 1-3
Any level less than 1 is treated as 1. Any level greater than the maximum for the input string collation is treated as maximum for the collation. The maximum varies per collation, but is never greater than 6.
In a list of levels, levels must be given in increasing order. In a range of levels, if the second number is less than the first, it is treated as the first number (for example, 4-2 is the same as 4-4).
If the 'LEVEL' clause is omitted, MySQL assumes 'LEVEL 1 - MAX', where MAX is the maximum level for the collation.
If 'LEVEL' is specified using list syntax (not range syntax), any level number can be followed by these modifiers:
* 'ASC': Return the weights without modification. This is the
default.
* 'DESC': Return bitwise-inverted weights (for example, '0x78f0
DESC' = '0x870f').
* 'REVERSE': Return the weights in reverse order (that is,the
weights for the reversed string, with the first character last
and the last first).
Examples:
mysql> SELECT HEX(WEIGHT_STRING(0x007fff LEVEL 1));
+--------------------------------------+
| HEX(WEIGHT_STRING(0x007fff LEVEL 1)) |
+--------------------------------------+
| 007FFF |
+--------------------------------------+
mysql> SELECT HEX(WEIGHT_STRING(0x007fff LEVEL 1 DESC));
+-------------------------------------------+
| HEX(WEIGHT_STRING(0x007fff LEVEL 1 DESC)) |
+-------------------------------------------+
| FF8000 |
+-------------------------------------------+
mysql> SELECT HEX(WEIGHT_STRING(0x007fff LEVEL 1 REVERSE));
+----------------------------------------------+
| HEX(WEIGHT_STRING(0x007fff LEVEL 1 REVERSE)) |
+----------------------------------------------+
| FF7F00 |
+----------------------------------------------+
mysql> SELECT HEX(WEIGHT_STRING(0x007fff LEVEL 1 DESC REVERSE));
+---------------------------------------------------+
| HEX(WEIGHT_STRING(0x007fff LEVEL 1 DESC REVERSE)) |
+---------------------------------------------------+
| 0080FF |
+---------------------------------------------------+
The FLAGS clause currently is unused.
If 'WEIGHT_STRING()' is invoked from within the note 'mysql': mysql. client, binary strings display using hexadecimal notation, depending on the value of the '--binary-as-hex'. For more information about that option, see note mysql::.
File: manual.info.tmp, Node: string-comparison-functions, Next: regexp, Prev: string-functions, Up: string-functions
String Comparison Functions and Operators
Name Description
'LIKE' Simple pattern matching
'NOT LIKE' Negation of simple pattern matching
'STRCMP()' Compare two strings
If a string function is given a binary string as an argument, the resulting string is also a binary string. A number converted to a string is treated as a binary string. This affects only comparisons.
Normally, if any expression in a string comparison is case-sensitive, the comparison is performed in case-sensitive fashion.
If a string function is invoked from within the note 'mysql': mysql. client, binary strings display using hexadecimal notation, depending on the value of the '--binary-as-hex'. For more information about that option, see note mysql::.
'EXPR LIKE PAT [ESCAPE 'ESCAPE_CHAR']'
Pattern matching using an SQL pattern. Returns '1' ('TRUE') or '0' ('FALSE'). If either EXPR or PAT is 'NULL', the result is 'NULL'.
The pattern need not be a literal string. For example, it can be specified as a string expression or table column. In the latter case, the column must be defined as one of the MySQL string types (see *note string-types::).
Per the SQL standard, 'LIKE' performs matching on a per-character basis, thus it can produce results different from the '=' comparison operator:
mysql> SELECT 'a"' LIKE 'ae' COLLATE latin1_german2_ci;
+-----------------------------------------+
| 'a"' LIKE 'ae' COLLATE latin1_german2_ci |
+-----------------------------------------+
| 0 |
+-----------------------------------------+
mysql> SELECT 'a"' = 'ae' COLLATE latin1_german2_ci;
+--------------------------------------+
| 'a"' = 'ae' COLLATE latin1_german2_ci |
+--------------------------------------+
| 1 |
+--------------------------------------+
In particular, trailing spaces are significant, which is not true for comparisons of nonbinary strings ('CHAR', 'VARCHAR', and 'TEXT' values) performed with the '=' operator:
mysql> SELECT 'a' = 'a ', 'a' LIKE 'a ';
+------------+---------------+
| 'a' = 'a ' | 'a' LIKE 'a ' |
+------------+---------------+
| 1 | 0 |
+------------+---------------+
1 row in set (0.00 sec)
With 'LIKE' you can use the following two wildcard characters in the pattern:
* '%' matches any number of characters, even zero characters.
* '_' matches exactly one character.
mysql> SELECT 'David!' LIKE 'David_';
-> 1
mysql> SELECT 'David!' LIKE '%D%v%';
-> 1
To test for literal instances of a wildcard character, precede it by the escape character. If you do not specify the 'ESCAPE' character, '' is assumed, unless the 'NO_BACKSLASH_ESCAPES' SQL mode is enabled. In that case, no escape character is used.
* '\%' matches one '%' character.
* '\_' matches one '_' character.
mysql> SELECT 'David!' LIKE 'David\_';
-> 0
mysql> SELECT 'David_' LIKE 'David\_';
-> 1
To specify a different escape character, use the 'ESCAPE' clause:
mysql> SELECT 'David_' LIKE 'David|_' ESCAPE '|';
-> 1
The escape sequence should be one character long to specify the escape character, or empty to specify that no escape character is used. The expression must evaluate as a constant at execution time. If the 'NO_BACKSLASH_ESCAPES' SQL mode is enabled, the sequence cannot be empty.
The following statements illustrate that string comparisons are not case-sensitive unless one of the operands is case-sensitive (uses a case-sensitive collation or is a binary string):
mysql> SELECT 'abc' LIKE 'ABC';
-> 1
mysql> SELECT 'abc' LIKE _latin1 'ABC' COLLATE latin1_general_cs;
-> 0
mysql> SELECT 'abc' LIKE _latin1 'ABC' COLLATE latin1_bin;
-> 0
mysql> SELECT 'abc' LIKE BINARY 'ABC';
-> 0
As an extension to standard SQL, MySQL permits 'LIKE' on numeric expressions.
mysql> SELECT 10 LIKE '1%';
-> 1
MySQL attempts in such cases to perform implicit conversion of the expression to a string. See *note type-conversion::.
Note:
MySQL uses C escape syntax in strings (for example, '' to represent the newline character). If you want a 'LIKE' string to contain a literal '', you must double it. (Unless the 'NO_BACKSLASH_ESCAPES' SQL mode is enabled, in which case no escape character is used.) For example, to search for '', specify it as '\n'. To search for '', specify it as '\\'; this is because the backslashes are stripped once by the parser and again when the pattern match is made, leaving a single backslash to be matched against.
Exception: At the end of the pattern string, backslash can be specified as '\'. At the end of the string, backslash stands for itself because there is nothing following to escape. Suppose that a table contains the following values:
mysql> SELECT filename FROM t1;
+--------------+
| filename |
+--------------+
| C: |
| C:\ |
| C:\Programs |
| C:\Programs\ |
+--------------+
To test for values that end with backslash, you can match the values using either of the following patterns:
mysql> SELECT filename, filename LIKE '%\\' FROM t1;
+--------------+---------------------+
| filename | filename LIKE '%\\' |
+--------------+---------------------+
| C: | 0 |
| C:\ | 1 |
| C:\Programs | 0 |
| C:\Programs\ | 1 |
+--------------+---------------------+
mysql> SELECT filename, filename LIKE '%\\\\' FROM t1;
+--------------+-----------------------+
| filename | filename LIKE '%\\\\' |
+--------------+-----------------------+
| C: | 0 |
| C:\ | 1 |
| C:\Programs | 0 |
| C:\Programs\ | 1 |
+--------------+-----------------------+
'EXPR NOT LIKE PAT [ESCAPE 'ESCAPE_CHAR']'
This is the same as 'NOT (EXPR LIKE PAT [ESCAPE 'ESCAPE_CHAR'])'.
Note:
Aggregate queries involving 'NOT LIKE' comparisons with columns containing 'NULL' may yield unexpected results. For example, consider the following table and data:
CREATE TABLE foo (bar VARCHAR(10));
INSERT INTO foo VALUES (NULL), (NULL);
The query 'SELECT COUNT() FROM foo WHERE bar LIKE '%baz%';' returns '0'. You might assume that 'SELECT COUNT() FROM foo WHERE bar NOT LIKE '%baz%';' would return '2'. However, this is not the case: The second query returns '0'. This is because 'NULL NOT LIKE EXPR' always returns 'NULL', regardless of the value of EXPR. The same is true for aggregate queries involving 'NULL' and comparisons using 'NOT RLIKE' or 'NOT REGEXP'. In such cases, you must test explicitly for 'NOT NULL' using 'OR' (and not 'AND'), as shown here:
SELECT COUNT(*) FROM foo WHERE bar NOT LIKE '%baz%' OR bar IS NULL;
'STRCMP(EXPR1,EXPR2)'
'STRCMP()' returns '0' if the strings are the same, '-1' if the first argument is smaller than the second according to the current sort order, and '1' otherwise.
mysql> SELECT STRCMP('text', 'text2');
-> -1
mysql> SELECT STRCMP('text2', 'text');
-> 1
mysql> SELECT STRCMP('text', 'text');
-> 0
'STRCMP()' performs the comparison using the collation of the arguments.
mysql> SET @s1 = _latin1 'x' COLLATE latin1_general_ci;
mysql> SET @s2 = _latin1 'X' COLLATE latin1_general_ci;
mysql> SET @s3 = _latin1 'x' COLLATE latin1_general_cs;
mysql> SET @s4 = _latin1 'X' COLLATE latin1_general_cs;
mysql> SELECT STRCMP(@s1, @s2), STRCMP(@s3, @s4);
+------------------+------------------+
| STRCMP(@s1, @s2) | STRCMP(@s3, @s4) |
+------------------+------------------+
| 0 | 1 |
+------------------+------------------+
If the collations are incompatible, one of the arguments must be converted to be compatible with the other. See *note charset-collation-coercibility::.
mysql> SELECT STRCMP(@s1, @s3);
ERROR 1267 (HY000): Illegal mix of collations (latin1_general_ci,IMPLICIT)
and (latin1_general_cs,IMPLICIT) for operation 'strcmp'
mysql> SELECT STRCMP(@s1, @s3 COLLATE latin1_general_ci);
+--------------------------------------------+
| STRCMP(@s1, @s3 COLLATE latin1_general_ci) |
+--------------------------------------------+
| 0 |
+--------------------------------------------+
File: manual.info.tmp, Node: regexp, Next: string-functions-charset, Prev: string-comparison-functions, Up: string-functions
Regular Expression Functions and Operators
Name Description
'NOT REGEXP' Negation of REGEXP
'REGEXP' Whether string matches regular expression
'RLIKE' Whether string matches regular expression
A regular expression is a powerful way of specifying a pattern for a complex search. This section discusses the operators available for regular expression matching and illustrates, with examples, some of the special characters and constructs that can be used for regular expression operations. See also *note pattern-matching::.
MySQL uses Henry Spencer's implementation of regular expressions, which is aimed at conformance with POSIX 1003.2. MySQL uses the extended version to support regular expression pattern-matching operations in SQL statements. This section does not contain all the details that can be found in Henry Spencer's 'regex(7)' manual page. That manual page is included in MySQL source distributions, in the 'regex.7' file under the 'regex' directory.
*note regexp-operator-descriptions::
*note regexp-syntax::
Regular Expression Function and Operator Descriptions
'EXPR NOT REGEXP PAT', 'EXPR NOT RLIKE PAT'
This is the same as 'NOT (EXPR REGEXP PAT)'.
'EXPR REGEXP PAT', 'EXPR RLIKE PAT'
Returns 1 if the string EXPR matches the regular expression specified by the pattern PAT, 0 otherwise. If either EXPR or PAT is 'NULL', the return value is 'NULL'.
'RLIKE' is a synonym for 'REGEXP'.
The pattern can be an extended regular expression, the syntax for which is discussed in *note regexp-syntax::. The pattern need not be a literal string. For example, it can be specified as a string expression or table column.
Note:
MySQL uses C escape syntax in strings (for example, '' to represent the newline character). If you want your EXPR or PAT argument to contain a literal '', you must double it. (Unless the 'NO_BACKSLASH_ESCAPES' SQL mode is enabled, in which case no escape character is used.)
Regular expression operations use the character set and collation of the string expression and pattern arguments when deciding the type of a character and performing the comparison. If the arguments have different character sets or collations, coercibility rules apply as described in *note charset-collation-coercibility::. If either argument is a binary string, the arguments are handled in case-sensitive fashion as binary strings.
mysql> SELECT 'Michael!' REGEXP '.*';
+------------------------+
| 'Michael!' REGEXP '.*' |
+------------------------+
| 1 |
+------------------------+
mysql> SELECT 'new*\n*line' REGEXP 'new\\*.\\*line';
+---------------------------------------+
| 'new*\n*line' REGEXP 'new\\*.\\*line' |
+---------------------------------------+
| 0 |
+---------------------------------------+
mysql> SELECT 'a' REGEXP '^[a-d]';
+---------------------+
| 'a' REGEXP '^[a-d]' |
+---------------------+
| 1 |
+---------------------+
Warning:
The 'REGEXP' and 'RLIKE' operators work in byte-wise fashion, so they are not multibyte safe and may produce unexpected results with multibyte character sets. In addition, these operators compare characters by their byte values and accented characters may not compare as equal even if a given collation treats them as equal.
Regular Expression Syntax
A regular expression describes a set of strings. The simplest regular expression is one that has no special characters in it. For example, the regular expression 'hello' matches 'hello' and nothing else.
Nontrivial regular expressions use certain special constructs so that they can match more than one string. For example, the regular expression 'hello|world' contains the '|' alternation operator and matches either the 'hello' or 'world'.
As a more complex example, the regular expression 'B[an]*s' matches any of the strings 'Bananas', 'Baaaaas', 'Bs', and any other string starting with a 'B', ending with an 's', and containing any number of 'a' or 'n' characters in between.
A regular expression for the 'REGEXP' operator may use any of the following special characters and constructs:
'^'
Match the beginning of a string.
mysql> SELECT 'fo\nfo' REGEXP '^fo$'; -> 0
mysql> SELECT 'fofo' REGEXP '^fo'; -> 1
'$'
Match the end of a string.
mysql> SELECT 'fo\no' REGEXP '^fo\no$'; -> 1
mysql> SELECT 'fo\no' REGEXP '^fo$'; -> 0
'.'
Match any character (including carriage return and newline).
mysql> SELECT 'fofo' REGEXP '^f.*$'; -> 1
mysql> SELECT 'fo\r\nfo' REGEXP '^f.*$'; -> 1
'a*'
Match any sequence of zero or more 'a' characters.
mysql> SELECT 'Ban' REGEXP '^Ba*n'; -> 1
mysql> SELECT 'Baaan' REGEXP '^Ba*n'; -> 1
mysql> SELECT 'Bn' REGEXP '^Ba*n'; -> 1
'a+'
Match any sequence of one or more 'a' characters.
mysql> SELECT 'Ban' REGEXP '^Ba+n'; -> 1
mysql> SELECT 'Bn' REGEXP '^Ba+n'; -> 0
'a?'
Match either zero or one 'a' character.
mysql> SELECT 'Bn' REGEXP '^Ba?n'; -> 1
mysql> SELECT 'Ban' REGEXP '^Ba?n'; -> 1
mysql> SELECT 'Baan' REGEXP '^Ba?n'; -> 0
'de|abc'
Alternation; match either of the sequences 'de' or 'abc'.
mysql> SELECT 'pi' REGEXP 'pi|apa'; -> 1
mysql> SELECT 'axe' REGEXP 'pi|apa'; -> 0
mysql> SELECT 'apa' REGEXP 'pi|apa'; -> 1
mysql> SELECT 'apa' REGEXP '^(pi|apa)$'; -> 1
mysql> SELECT 'pi' REGEXP '^(pi|apa)$'; -> 1
mysql> SELECT 'pix' REGEXP '^(pi|apa)$'; -> 0
'(abc)*'
Match zero or more instances of the sequence 'abc'.
mysql> SELECT 'pi' REGEXP '^(pi)*$'; -> 1
mysql> SELECT 'pip' REGEXP '^(pi)*$'; -> 0
mysql> SELECT 'pipi' REGEXP '^(pi)*$'; -> 1
'{1}', '{2,3}'
Repetition; '{N}' and '{M,N}' notation provide a more general way of writing regular expressions that match many occurrences of the previous atom (or 'piece') of the pattern. M and N are integers.
* 'a*'
Can be written as 'a{0,}'.
* 'a+'
Can be written as 'a{1,}'.
* 'a?'
Can be written as 'a{0,1}'.
To be more precise, 'a{N}' matches exactly N instances of 'a'. 'a{N,}' matches N or more instances of 'a'. 'a{M,N}' matches M through N instances of 'a', inclusive. If both M and N are given, M must be less than or equal to N.
M and N must be in the range from '0' to 'RE_DUP_MAX' (default 255), inclusive.
mysql> SELECT 'abcde' REGEXP 'a[bcd]{2}e'; -> 0
mysql> SELECT 'abcde' REGEXP 'a[bcd]{3}e'; -> 1
mysql> SELECT 'abcde' REGEXP 'a[bcd]{1,10}e'; -> 1
'[a-dX]', '[^a-dX]'
Matches any character that is (or is not, if '^' is used) either 'a', 'b', 'c', 'd' or 'X'. A '-' character between two other characters forms a range that matches all characters from the first character to the second. For example, '[0-9]' matches any decimal digit. To include a literal ']' character, it must immediately follow the opening bracket '['. To include a literal '-' character, it must be written first or last. Any character that does not have a defined special meaning inside a '[]' pair matches only itself.
mysql> SELECT 'aXbc' REGEXP '[a-dXYZ]'; -> 1
mysql> SELECT 'aXbc' REGEXP '^[a-dXYZ]$'; -> 0
mysql> SELECT 'aXbc' REGEXP '^[a-dXYZ]+$'; -> 1
mysql> SELECT 'aXbc' REGEXP '^[^a-dXYZ]+$'; -> 0
mysql> SELECT 'gheis' REGEXP '^[^a-dXYZ]+$'; -> 1
mysql> SELECT 'gheisa' REGEXP '^[^a-dXYZ]+$'; -> 0
'[.characters.]'
Within a bracket expression (written using '[' and ']'), matches the sequence of characters of that collating element. 'characters' is either a single character or a character name like 'newline'. The following table lists the permissible character names.
The following table shows the permissible character names and the characters that they match. For characters given as numeric values, the values are represented in octal.
Name Character Name Character
'NUL'
'0' 'SOH' '001'
'STX'
'002' 'ETX' '003'
'EOT'
'004' 'ENQ' '005'
'ACK'
'006' 'BEL' '007'
'alert'
'007' 'BS' '010'
'backspace'
'''' 'HT' '011'
'tab'
'''' 'LF' '012'
'newline'
'''' 'VT' '013'
'vertical-tab'
'''' 'FF' '014'
'form-feed'
'''' 'CR' '015'
'carriage-return'
'''' 'SO' '016'
'SI'
'017' 'DLE' '020'
'DC1'
'021' 'DC2' '022'
'DC3'
'023' 'DC4' '024'
'NAK'
'025' 'SYN' '026'
'ETB'
'027' 'CAN' '030'
'EM'
'031' 'SUB' '032'
'ESC'
'033' 'IS4' '034'
'FS'
'034' 'IS3' '035'
'GS'
'035' 'IS2' '036'
'RS'
'036' 'IS1' '037'
'US'
'037' 'space' '' ''
'exclamation-mark'
''!'' 'quotation-mark' ''"''
'number-sign'
''#'' 'dollar-sign' ''$''
'percent-sign'
''%'' 'ampersand' ''&''
'apostrophe'
''''' 'left-parenthesis' ''(''
'right-parenthesis'
'')'' 'asterisk' ''*''
'plus-sign'
''+'' 'comma' '',''
'hyphen'
''-'' 'hyphen-minus' ''-''
'period'
''.'' 'full-stop' ''.''
'slash'
''/'' 'solidus' ''/''
'zero'
''0'' 'one' ''1''
'two'
''2'' 'three' ''3''
'four'
''4'' 'five' ''5''
'six'
''6'' 'seven' ''7''
'eight'
''8'' 'nine' ''9''
'colon'
'':'' 'semicolon' '';''
'less-than-sign'
''<'' 'equals-sign' ''=''
'greater-than-sign'
''>'' 'question-mark' ''?''
'commercial-at'
''@'' 'left-square-bracket'''[''
'backslash'
''\'' 'reverse-solidus' ''\''
'right-square-bracket'
'']'' 'circumflex' ''^''
'circumflex-accent'
''^'' 'underscore' ''_''
'low-line'
''_'' 'grave-accent' ''`''
'left-brace'
''{'' 'left-curly-bracket'''{''
'vertical-line'
''|'' 'right-brace' ''}''
'right-curly-bracket'
''}'' 'tilde' ''~''
'DEL'
'177'
mysql> SELECT '~' REGEXP '[[.~.]]'; -> 1
mysql> SELECT '~' REGEXP '[[.tilde.]]'; -> 1
'[=character_class=]'
Within a bracket expression (written using '[' and ']'), '[=character_class=]' represents an equivalence class. It matches all characters with the same collation value, including itself. For example, if 'o' and '(+)' are the members of an equivalence class, '[[=o=]]', '[[=(+)=]]', and '[o(+)]' are all synonymous. An equivalence class may not be used as an endpoint of a range.
'[:character_class:]'
Within a bracket expression (written using '[' and ']'), '[:character_class:]' represents a character class that matches all characters belonging to that class. The following table lists the standard class names. These names stand for the character classes defined in the 'ctype(3)' manual page. A particular locale may provide other class names. A character class may not be used as an endpoint of a range.
Character Meaning Class Name
'alnum' Alphanumeric characters
'alpha' Alphabetic characters
'blank' Whitespace characters
'cntrl' Control characters
'digit' Digit characters
'graph' Graphic characters
'lower' Lowercase alphabetic characters
'print' Graphic or space characters
'punct' Punctuation characters
'space' Space, tab, newline, and carriage return
'upper' Uppercase alphabetic characters
'xdigit' Hexadecimal digit characters
mysql> SELECT 'justalnums' REGEXP '[[:alnum:]]+'; -> 1
mysql> SELECT '!!' REGEXP '[[:alnum:]]+'; -> 0
'[[:<:]]', '[[:>:]]'
These markers stand for word boundaries. They match the beginning and end of words, respectively. A word is a sequence of word characters that is not preceded by or followed by word characters. A word character is an alphanumeric character in the 'alnum' class or an underscore ('_').
mysql> SELECT 'a word a' REGEXP '[[:<:]]word[[:>:]]'; -> 1
mysql> SELECT 'a xword a' REGEXP '[[:<:]]word[[:>:]]'; -> 0
To use a literal instance of a special character in a regular expression, precede it by two backslash () characters. The MySQL parser interprets one of the backslashes, and the regular expression library interprets the other. For example, to match the string '1+2' that contains the special '+' character, only the last of the following regular expressions is the correct one:
mysql> SELECT '1+2' REGEXP '1+2'; -> 0
mysql> SELECT '1+2' REGEXP '1\+2'; -> 0
mysql> SELECT '1+2' REGEXP '1\\+2'; -> 1
File: manual.info.tmp, Node: string-functions-charset, Prev: regexp, Up: string-functions
MySQL has many operators and functions that return a string. This section answers the question: What is the character set and collation of such a string?
For simple functions that take string input and return a string result as output, the output's character set and collation are the same as those of the principal input value. For example, 'UPPER(X)' returns a string with the same character string and collation as X. The same applies for 'INSTR()', 'LCASE()', 'LOWER()', 'LTRIM()', 'MID()', 'REPEAT()', 'REPLACE()', 'REVERSE()', 'RIGHT()', 'RPAD()', 'RTRIM()', 'SOUNDEX()', 'SUBSTRING()', 'TRIM()', 'UCASE()', and 'UPPER()'.
Note:
The 'REPLACE()' function, unlike all other functions, always ignores the collation of the string input and performs a case-sensitive comparison.
If a string input or function result is a binary string, the string has the 'binary' character set and collation. This can be checked by using the 'CHARSET()' and 'COLLATION()' functions, both of which return 'binary' for a binary string argument:
mysql> SELECT CHARSET(BINARY 'a'), COLLATION(BINARY 'a');
+---------------------+-----------------------+
| CHARSET(BINARY 'a') | COLLATION(BINARY 'a') |
+---------------------+-----------------------+
| binary | binary |
+---------------------+-----------------------+
For operations that combine multiple string inputs and return a single string output, the 'aggregation rules' of standard SQL apply for determining the collation of the result:
If an explicit 'COLLATE Y' occurs, use Y.
If explicit 'COLLATE Y' and 'COLLATE Z' occur, raise an error.
Otherwise, if all collations are Y, use Y.
Otherwise, the result has no collation.
For example, with 'CASE ... WHEN a THEN b WHEN b THEN c COLLATE X END', the resulting collation is X. The same applies for *note 'UNION': union, '||', 'CONCAT()', 'ELT()', 'GREATEST()', 'IF()', and 'LEAST()'.
For operations that convert to character data, the character set and collation of the strings that result from the operations are defined by the 'character_set_connection' and 'collation_connection' system variables that determine the default connection character set and collation (see *note charset-connection::). This applies only to 'CAST()', 'CONV()', 'FORMAT()', 'HEX()', and 'SPACE()'.
As of MySQL 5.7.19, an exception to the preceding principle occurs for expressions for virtual generated columns. In such expressions, the table character set is used for 'CONV()' or 'HEX()' results, regardless of connection character set.
If there is any question about the character set or collation of the result returned by a string function, use the 'CHARSET()' or 'COLLATION()' function to find out:
mysql> SELECT USER(), CHARSET(USER()), COLLATION(USER());
+----------------+-----------------+-------------------+
| USER() | CHARSET(USER()) | COLLATION(USER()) |
+----------------+-----------------+-------------------+
| test@localhost | utf8 | utf8_general_ci |
+----------------+-----------------+-------------------+
mysql> SELECT CHARSET(COMPRESS('abc')), COLLATION(COMPRESS('abc'));
+--------------------------+----------------------------+
| CHARSET(COMPRESS('abc')) | COLLATION(COMPRESS('abc')) |
+--------------------------+----------------------------+
| binary | binary |
+--------------------------+----------------------------+
File: manual.info.tmp, Node: fulltext-search, Next: cast-functions, Prev: string-functions, Up: functions