12.11 XML Functions

XML Functions

Name Description

'ExtractValue()' Extract a value from an XML string using XPath notation

'UpdateXML()' Return replaced XML fragment

This section discusses XML and related functionality in MySQL.

Note:

It is possible to obtain XML-formatted output from MySQL in the note 'mysql': mysql. and note 'mysqldump': mysqldump. clients by invoking them with the '--xml' option. See note mysql::, and note mysqldump::.

Two functions providing basic XPath 1.0 (XML Path Language, version 1.0) capabilities are available. Some basic information about XPath syntax and usage is provided later in this section; however, an in-depth discussion of these topics is beyond the scope of this manual, and you should refer to the XML Path Language (XPath) 1.0 standard (http://www.w3.org/TR/xpath) for definitive information. A useful resource for those new to XPath or who desire a refresher in the basics is the Zvon.org XPath Tutorial (http://www.zvon.org/xxl/XPathTutorial/), which is available in several languages.

Note:

These functions remain under development. We continue to improve these and other aspects of XML and XPath functionality in MySQL 5.7 and onwards. You may discuss these, ask questions about them, and obtain help from other users with them in the MySQL XML User Forum (https://forums.mysql.com/list.php?44).

XPath expressions used with these functions support user variables and local stored program variables. User variables are weakly checked; variables local to stored programs are strongly checked (see also Bug #26518):

Expressions containing user variables or variables local to stored programs must otherwise (except for notation) conform to the rules for XPath expressions containing variables as given in the XPath 1.0 specification.

Note:

A user variable used to store an XPath expression is treated as an empty string. Because of this, it is not possible to store an XPath expression as a user variable. (Bug #32911)

Note:

A discussion in depth of XPath syntax and usage are beyond the scope of this manual. Please see the XML Path Language (XPath) 1.0 specification (http://www.w3.org/TR/xpath) for definitive information. A useful resource for those new to XPath or who are wishing a refresher in the basics is the Zvon.org XPath Tutorial (http://www.zvon.org/xxl/XPathTutorial/), which is available in several languages.

Descriptions and examples of some basic XPath expressions follow:

XPath Limitations

The XPath syntax supported by these functions is currently subject to the following limitations:

XPath expressions passed as arguments to 'ExtractValue()' and 'UpdateXML()' may contain the colon character (':') in element selectors, which enables their use with markup employing XML namespaces notation. For example:

 mysql> SET @xml = '<a>111<b:c>222<d>333</d><e:f>444</e:f></b:c></a>';
 Query OK, 0 rows affected (0.00 sec)

 mysql> SELECT ExtractValue(@xml, '//e:f');
 +-----------------------------+
 | ExtractValue(@xml, '//e:f') |
 +-----------------------------+
 | 444                         |
 +-----------------------------+
 1 row in set (0.00 sec)

 mysql> SELECT UpdateXML(@xml, '//b:c', '<g:h>555</g:h>');
 +--------------------------------------------+
 | UpdateXML(@xml, '//b:c', '<g:h>555</g:h>') |
 +--------------------------------------------+
 | <a>111<g:h>555</g:h></a>                   |
 +--------------------------------------------+
 1 row in set (0.00 sec)

This is similar in some respects to what is permitted by Apache Xalan (http://xalan.apache.org/) and some other parsers, and is much simpler than requiring namespace declarations or the use of the 'namespace-uri()' and 'local-name()' functions.

Error handling

For both 'ExtractValue()' and 'UpdateXML()', the XPath locator used must be valid and the XML to be searched must consist of elements which are properly nested and closed. If the locator is invalid, an error is generated:

 mysql> SELECT ExtractValue('<a>c</a><b/>', '/&a');
 ERROR 1105 (HY000): XPATH syntax error: '&a'

If XML_FRAG does not consist of elements which are properly nested and closed, 'NULL' is returned and a warning is generated, as shown in this example:

 mysql> SELECT ExtractValue('<a>c</a><b', '//a');
 +-----------------------------------+
 | ExtractValue('<a>c</a><b', '//a') |
 +-----------------------------------+
 | NULL                              |
 +-----------------------------------+
 1 row in set, 1 warning (0.00 sec)

 mysql> SHOW WARNINGS\G
 *************************** 1. row ***************************
   Level: Warning
    Code: 1525
 Message: Incorrect XML value: 'parse error at line 1 pos 11:
          END-OF-INPUT unexpected ('>' wanted)'
 1 row in set (0.00 sec)

 mysql> SELECT ExtractValue('<a>c</a><b/>', '//a');
 +-------------------------------------+
 | ExtractValue('<a>c</a><b/>', '//a') |
 +-------------------------------------+
 | c                                   |
 +-------------------------------------+
 1 row in set (0.00 sec)

Important:

The replacement XML used as the third argument to 'UpdateXML()' is not checked to determine whether it consists solely of elements which are properly nested and closed.

XPath Injection

code injection occurs when malicious code is introduced into the system to gain unauthorized access to privileges and data. It is based on exploiting assumptions made by developers about the type and content of data input from users. XPath is no exception in this regard.

A common scenario in which this can happen is the case of application which handles authorization by matching the combination of a login name and password with those found in an XML file, using an XPath expression like this one:

 //user[login/text()='neapolitan' and password/text()='1c3cr34m']/attribute::id

This is the XPath equivalent of an SQL statement like this one:

 SELECT id FROM users WHERE login='neapolitan' AND password='1c3cr34m';

A PHP application employing XPath might handle the login process like this:

 <?php

   $file     =   "users.xml";

   $login    =   $POST["login"];
   $password =   $POST["password"];

   $xpath = "//user[login/text()=$login and password/text()=$password]/attribute::id";

   if( file_exists($file) )
   {
     $xml = simplexml_load_file($file);

     if($result = $xml->xpath($xpath))
       echo "You are now logged in as user $result[0].";
     else
       echo "Invalid login name or password.";
   }
   else
     exit("Failed to open $file.");

 ?>

No checks are performed on the input. This means that a malevolent user can 'short-circuit' the test by entering '' or 1=1' for both the login name and password, resulting in $xpath being evaluated as shown here:

 //user[login/text()='' or 1=1 and password/text()='' or 1=1]/attribute::id

Since the expression inside the square brackets always evaluates as 'true', it is effectively the same as this one, which matches the 'id' attribute of every 'user' element in the XML document:

 //user/attribute::id

One way in which this particular attack can be circumvented is simply by quoting the variable names to be interpolated in the definition of '$xpath', forcing the values passed from a Web form to be converted to strings:

 $xpath = "//user[login/text()='$login' and password/text()='$password']/attribute::id";

This is the same strategy that is often recommended for preventing SQL injection attacks. In general, the practices you should follow for preventing XPath injection attacks are the same as for preventing SQL injection:

Just as SQL injection attacks can be used to obtain information about database schemas, so can XPath injection be used to traverse XML files to uncover their structure, as discussed in Amit Klein's paper Blind XPath Injection (http://www.packetstormsecurity.org/papers/bypass/Blind_XPath_Injection_20040518.pdf) (PDF file, 46KB).

It is also important to check the output being sent back to the client. Consider what can happen when we use the MySQL 'ExtractValue()' function:

 mysql> SELECT ExtractValue(
     ->     LOAD_FILE('users.xml'),
     ->     '//user[login/text()="" or 1=1 and password/text()="" or 1=1]/attribute::id'
     -> ) AS id;
 +-------------------------------+
 | id                            |
 +-------------------------------+
 | 00327 13579 02403 42354 28570 |
 +-------------------------------+
 1 row in set (0.01 sec)

Because 'ExtractValue()' returns multiple matches as a single space-delimited string, this injection attack provides every valid ID contained within 'users.xml' to the user as a single row of output. As an extra safeguard, you should also test output before returning it to the user. Here is a simple example:

 mysql> SELECT @id = ExtractValue(
     ->     LOAD_FILE('users.xml'),
     ->     '//user[login/text()="" or 1=1 and password/text()="" or 1=1]/attribute::id'
     -> );
 Query OK, 0 rows affected (0.00 sec)

 mysql> SELECT IF(
     ->     INSTR(@id, ' ') = 0,
     ->     @id,
     ->     'Unable to retrieve user ID')
     -> AS singleID;
 +----------------------------+
 | singleID                   |
 +----------------------------+
 | Unable to retrieve user ID |
 +----------------------------+
 1 row in set (0.00 sec)

In general, the guidelines for returning data to users securely are the same as for accepting user input. These can be summed up as:

 File: manual.info.tmp, Node: bit-functions, Next: encryption-functions, Prev: xml-functions, Up: functions