12.19 Aggregate Functions

Aggregate functions operate on sets of values. They are often used with a 'GROUP BY' clause to group values into subsets.

 File: manual.info.tmp, Node: aggregate-functions, Next: group-by-modifiers, Prev: aggregate-functions-and-modifiers, Up: aggregate-functions-and-modifiers

12.19.1 Aggregate Function Descriptions

This section describes aggregate functions that operate on sets of values. They are often used with a 'GROUP BY' clause to group values into subsets.

Aggregate Functions

Name Description Introduced

'AVG()'

Return the average value of the argument

'BIT_AND()'

Return bitwise AND

'BIT_OR()'

Return bitwise OR

'BIT_XOR()'

Return bitwise XOR

'COUNT()'

Return a count of the number of rows returned

'COUNT(DISTINCT)'

Return the count of a number of different values

'GROUP_CONCAT()'

Return a concatenated string

'JSON_ARRAYAGG()'

Return result 5.7.22 set as a single
JSON array

'JSON_OBJECTAGG()'

Return result 5.7.22 set as a single
JSON object

'MAX()'

Return the maximum value

'MIN()'

Return the minimum value

'STD()'

Return the population standard deviation

'STDDEV()'

Return the population standard deviation

'STDDEV_POP()'

Return the population standard deviation

'STDDEV_SAMP()'

Return the sample standard deviation

'SUM()'

Return the sum

'VAR_POP()'

Return the population standard variance

'VAR_SAMP()'

Return the sample variance

'VARIANCE()'

Return the population standard variance

Unless otherwise stated, aggregate functions ignore 'NULL' values.

If you use an aggregate function in a statement containing no 'GROUP BY' clause, it is equivalent to grouping on all rows. For more information, see *note group-by-handling::.

For numeric arguments, the variance and standard deviation functions return a note 'DOUBLE': floating-point-types. value. The 'SUM()' and 'AVG()' functions return a note 'DECIMAL': fixed-point-types. value for exact-value arguments (integer or note 'DECIMAL': fixed-point-types.), and a note 'DOUBLE': floating-point-types. value for approximate-value arguments (note 'FLOAT': floating-point-types. or note 'DOUBLE': floating-point-types.).

The 'SUM()' and 'AVG()' aggregate functions do not work with temporal values. (They convert the values to numbers, losing everything after the first nonnumeric character.) To work around this problem, convert to numeric units, perform the aggregate operation, and convert back to a temporal value. Examples:

 SELECT SEC_TO_TIME(SUM(TIME_TO_SEC(TIME_COL))) FROM TBL_NAME;
 SELECT FROM_DAYS(SUM(TO_DAYS(DATE_COL))) FROM TBL_NAME;

Functions such as 'SUM()' or 'AVG()' that expect a numeric argument cast the argument to a number if necessary. For note 'SET': set. or note 'ENUM': enum. values, the cast operation causes the underlying numeric value to be used.

The 'BIT_AND()', 'BIT_OR()', and 'BIT_XOR()' aggregate functions perform bit operations. They require note 'BIGINT': integer-types. (64-bit integer) arguments and return note 'BIGINT': integer-types. values. Arguments of other types are converted to note 'BIGINT': integer-types. and truncation might occur. For information about a change in MySQL 8.0 that permits bit operations to take binary string type arguments (note 'BINARY': binary-varbinary, note 'VARBINARY': binary-varbinary, and the note 'BLOB': blob. types), see *note bit-functions::.

 File: manual.info.tmp, Node: group-by-modifiers, Next: group-by-handling, Prev: aggregate-functions, Up: aggregate-functions-and-modifiers

12.19.2 GROUP BY Modifiers

The 'GROUP BY' clause permits a 'WITH ROLLUP' modifier that causes summary output to include extra rows that represent higher-level (that is, super-aggregate) summary operations. 'ROLLUP' thus enables you to answer questions at multiple levels of analysis with a single query. For example, 'ROLLUP' can be used to provide support for OLAP (Online Analytical Processing) operations.

Suppose that a 'sales' table has 'year', 'country', 'product', and 'profit' columns for recording sales profitability:

 CREATE TABLE sales
 (
     year    INT,
     country VARCHAR(20),
     product VARCHAR(32),
     profit  INT
 );

To summarize table contents per year, use a simple 'GROUP BY' like this:

 mysql> SELECT year, SUM(profit) AS profit
        FROM sales
        GROUP BY year;
 +------+--------+
 | year | profit |
 +------+--------+
 | 2000 |   4525 |
 | 2001 |   3010 |
 +------+--------+

The output shows the total (aggregate) profit for each year. To also determine the total profit summed over all years, you must add up the individual values yourself or run an additional query. Or you can use 'ROLLUP', which provides both levels of analysis with a single query. Adding a 'WITH ROLLUP' modifier to the 'GROUP BY' clause causes the query to produce another (super-aggregate) row that shows the grand total over all year values:

 mysql> SELECT year, SUM(profit) AS profit
        FROM sales
        GROUP BY year WITH ROLLUP;
 +------+--------+
 | year | profit |
 +------+--------+
 | 2000 |   4525 |
 | 2001 |   3010 |
 | NULL |   7535 |
 +------+--------+

The 'NULL' value in the 'year' column identifies the grand total super-aggregate line.

'ROLLUP' has a more complex effect when there are multiple 'GROUP BY' columns. In this case, each time there is a change in value in any but the last grouping column, the query produces an extra super-aggregate summary row.

For example, without 'ROLLUP', a summary of the 'sales' table based on 'year', 'country', and 'product' might look like this, where the output indicates summary values only at the year/country/product level of analysis:

 mysql> SELECT year, country, product, SUM(profit) AS profit
        FROM sales
        GROUP BY year, country, product;
 +------+---------+------------+--------+
 | year | country | product    | profit |
 +------+---------+------------+--------+
 | 2000 | Finland | Computer   |   1500 |
 | 2000 | Finland | Phone      |    100 |
 | 2000 | India   | Calculator |    150 |
 | 2000 | India   | Computer   |   1200 |
 | 2000 | USA     | Calculator |     75 |
 | 2000 | USA     | Computer   |   1500 |
 | 2001 | Finland | Phone      |     10 |
 | 2001 | USA     | Calculator |     50 |
 | 2001 | USA     | Computer   |   2700 |
 | 2001 | USA     | TV         |    250 |
 +------+---------+------------+--------+

With 'ROLLUP' added, the query produces several extra rows:

 mysql> SELECT year, country, product, SUM(profit) AS profit
        FROM sales
        GROUP BY year, country, product WITH ROLLUP;
 +------+---------+------------+--------+
 | year | country | product    | profit |
 +------+---------+------------+--------+
 | 2000 | Finland | Computer   |   1500 |
 | 2000 | Finland | Phone      |    100 |
 | 2000 | Finland | NULL       |   1600 |
 | 2000 | India   | Calculator |    150 |
 | 2000 | India   | Computer   |   1200 |
 | 2000 | India   | NULL       |   1350 |
 | 2000 | USA     | Calculator |     75 |
 | 2000 | USA     | Computer   |   1500 |
 | 2000 | USA     | NULL       |   1575 |
 | 2000 | NULL    | NULL       |   4525 |
 | 2001 | Finland | Phone      |     10 |
 | 2001 | Finland | NULL       |     10 |
 | 2001 | USA     | Calculator |     50 |
 | 2001 | USA     | Computer   |   2700 |
 | 2001 | USA     | TV         |    250 |
 | 2001 | USA     | NULL       |   3000 |
 | 2001 | NULL    | NULL       |   3010 |
 | NULL | NULL    | NULL       |   7535 |
 +------+---------+------------+--------+

Now the output includes summary information at four levels of analysis, not just one:

The 'NULL' indicators in each super-aggregate row are produced when the row is sent to the client. The server looks at the columns named in the 'GROUP BY' clause following the leftmost one that has changed value. For any column in the result set with a name that matches any of those names, its value is set to 'NULL'. (If you specify grouping columns by column position, the server identifies which columns to set to 'NULL' by position.)

Because the 'NULL' values in the super-aggregate rows are placed into the result set at such a late stage in query processing, you can test them as 'NULL' values only in the select list or 'HAVING' clause. You cannot test them as 'NULL' values in join conditions or the 'WHERE' clause to determine which rows to select. For example, you cannot add 'WHERE product IS NULL' to the query to eliminate from the output all but the super-aggregate rows.

The 'NULL' values do appear as 'NULL' on the client side and can be tested as such using any MySQL client programming interface. However, at this point, you cannot distinguish whether a 'NULL' represents a regular grouped value or a super-aggregate value. In MySQL 8.0, you can use the 'GROUPING()' (https://dev.mysql.com/doc/refman/8.0/en/miscellaneous-functions.html#function_grouping) function to test the distinction.

Other Considerations When using ROLLUP

The following discussion lists some behaviors specific to the MySQL implementation of 'ROLLUP'.

When you use 'ROLLUP', you cannot also use an 'ORDER BY' clause to sort the results. In other words, 'ROLLUP' and 'ORDER BY' are mutually exclusive in MySQL. However, you still have some control over sort order. To work around the restriction that prevents using 'ROLLUP' with 'ORDER BY' and achieve a specific sort order of grouped results, generate the grouped result set as a derived table and apply 'ORDER BY' to it. For example:

 mysql> SELECT * FROM
          (SELECT year, SUM(profit) AS profit
          FROM sales GROUP BY year WITH ROLLUP) AS dt
        ORDER BY year DESC;
 +------+--------+
 | year | profit |
 +------+--------+
 | 2001 |   3010 |
 | 2000 |   4525 |
 | NULL |   7535 |
 +------+--------+

In this case, the super-aggregate summary rows sort with the rows from which they are calculated, and their placement depends on sort order (at the beginning for ascending sort, at the end for descending sort).

'LIMIT' can be used to restrict the number of rows returned to the client. 'LIMIT' is applied after 'ROLLUP', so the limit applies against the extra rows added by 'ROLLUP'. For example:

 mysql> SELECT year, country, product, SUM(profit) AS profit
        FROM sales
        GROUP BY year, country, product WITH ROLLUP
        LIMIT 5;
 +------+---------+------------+--------+
 | year | country | product    | profit |
 +------+---------+------------+--------+
 | 2000 | Finland | Computer   |   1500 |
 | 2000 | Finland | Phone      |    100 |
 | 2000 | Finland | NULL       |   1600 |
 | 2000 | India   | Calculator |    150 |
 | 2000 | India   | Computer   |   1200 |
 +------+---------+------------+--------+

Using 'LIMIT' with 'ROLLUP' may produce results that are more difficult to interpret, because there is less context for understanding the super-aggregate rows.

A MySQL extension permits a column that does not appear in the 'GROUP BY' list to be named in the select list. (For information about nonaggregated columns and 'GROUP BY', see *note group-by-handling::.) In this case, the server is free to choose any value from this nonaggregated column in summary rows, and this includes the extra rows added by 'WITH ROLLUP'. For example, in the following query, 'country' is a nonaggregated column that does not appear in the 'GROUP BY' list and values chosen for this column are nondeterministic:

 mysql> SELECT year, country, SUM(profit) AS profit
        FROM sales
        GROUP BY year WITH ROLLUP;
 +------+---------+--------+
 | year | country | profit |
 +------+---------+--------+
 | 2000 | India   |   4525 |
 | 2001 | USA     |   3010 |
 | NULL | USA     |   7535 |
 +------+---------+--------+

This behavior is permitted when the 'ONLY_FULL_GROUP_BY' SQL mode is not enabled. If that mode is enabled, the server rejects the query as illegal because 'country' is not listed in the 'GROUP BY' clause. With 'ONLY_FULL_GROUP_BY' enabled, you can still execute the query by using the 'ANY_VALUE()' function for nondeterministic-value columns:

 mysql> SELECT year, ANY_VALUE(country) AS country, SUM(profit) AS profit
        FROM sales
        GROUP BY year WITH ROLLUP;
 +------+---------+--------+
 | year | country | profit |
 +------+---------+--------+
 | 2000 | India   |   4525 |
 | 2001 | USA     |   3010 |
 | NULL | USA     |   7535 |
 +------+---------+--------+

 File: manual.info.tmp, Node: group-by-handling, Next: group-by-functional-dependence, Prev: group-by-modifiers, Up: aggregate-functions-and-modifiers

12.19.3 MySQL Handling of GROUP BY

SQL-92 and earlier does not permit queries for which the select list, 'HAVING' condition, or 'ORDER BY' list refer to nonaggregated columns that are not named in the 'GROUP BY' clause. For example, this query is illegal in standard SQL-92 because the nonaggregated 'name' column in the select list does not appear in the 'GROUP BY':

 SELECT o.custid, c.name, MAX(o.payment)
   FROM orders AS o, customers AS c
   WHERE o.custid = c.custid
   GROUP BY o.custid;

For the query to be legal in SQL-92, the 'name' column must be omitted from the select list or named in the 'GROUP BY' clause.

SQL:1999 and later permits such nonaggregates per optional feature T301 if they are functionally dependent on 'GROUP BY' columns: If such a relationship exists between 'name' and 'custid', the query is legal. This would be the case, for example, were 'custid' a primary key of 'customers'.

MySQL 5.7.5 and later implements detection of functional dependence. If the 'ONLY_FULL_GROUP_BY' SQL mode is enabled (which it is by default), MySQL rejects queries for which the select list, 'HAVING' condition, or 'ORDER BY' list refer to nonaggregated columns that are neither named in the 'GROUP BY' clause nor are functionally dependent on them. (Before 5.7.5, MySQL does not detect functional dependency and 'ONLY_FULL_GROUP_BY' is not enabled by default.)

MySQL 5.7.5 and later also permits a nonaggregate column not named in a 'GROUP BY' clause when 'ONLY_FULL_GROUP_BY' SQL mode is enabled, provided that this column is limited to a single value, as shown in the following example:

 mysql> CREATE TABLE mytable (
     ->    id INT UNSIGNED NOT NULL PRIMARY KEY,
     ->    a VARCHAR(10),
     ->    b INT
     -> );

 mysql> INSERT INTO mytable
     -> VALUES (1, 'abc', 1000),
     ->        (2, 'abc', 2000),
     ->        (3, 'def', 4000);

 mysql> SET SESSION sql_mode = sys.list_add(@@session.sql_mode, 'ONLY_FULL_GROUP_BY');

 mysql> SELECT a, SUM(b) FROM mytable WHERE a = 'abc';
 +------+--------+
 | a    | SUM(b) |
 +------+--------+
 | abc  |   3000 |
 +------+--------+

It is also possible to have more than one nonaggregate column in the *note 'SELECT': select. list when employing 'ONLY_FULL_GROUP_BY'. In this case, every such column must be limited to a single value, and all such limiting conditions must be joined by logical 'AND', as shown here:

 mysql> DROP TABLE IF EXISTS mytable;

 mysql> CREATE TABLE mytable (
     ->    id INT UNSIGNED NOT NULL PRIMARY KEY,
     ->    a VARCHAR(10),
     ->    b VARCHAR(10),
     ->    c INT
     -> );

 mysql> INSERT INTO mytable
     -> VALUES (1, 'abc', 'qrs', 1000),
     ->        (2, 'abc', 'tuv', 2000),
     ->        (3, 'def', 'qrs', 4000),
     ->        (4, 'def', 'tuv', 8000),
     ->        (5, 'abc', 'qrs', 16000),
     ->        (6, 'def', 'tuv', 32000);

 mysql> SELECT @@session.sql_mode;
 +---------------------------------------------------------------+
 | @@session.sql_mode                                            |
 +---------------------------------------------------------------+
 | ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ENGINE_SUBSTITUTION |
 +---------------------------------------------------------------+

 mysql> SELECT a, b, SUM(c) FROM mytable
     ->     WHERE a = 'abc' AND b = 'qrs';
 +------+------+--------+
 | a    | b    | SUM(c) |
 +------+------+--------+
 | abc  | qrs  |  17000 |
 +------+------+--------+

If 'ONLY_FULL_GROUP_BY' is disabled, a MySQL extension to the standard SQL use of 'GROUP BY' permits the select list, 'HAVING' condition, or 'ORDER BY' list to refer to nonaggregated columns even if the columns are not functionally dependent on 'GROUP BY' columns. This causes MySQL to accept the preceding query. In this case, the server is free to choose any value from each group, so unless they are the same, the values chosen are nondeterministic, which is probably not what you want. Furthermore, the selection of values from each group cannot be influenced by adding an 'ORDER BY' clause. Result set sorting occurs after values have been chosen, and 'ORDER BY' does not affect which value within each group the server chooses. Disabling 'ONLY_FULL_GROUP_BY' is useful primarily when you know that, due to some property of the data, all values in each nonaggregated column not named in the 'GROUP BY' are the same for each group.

You can achieve the same effect without disabling 'ONLY_FULL_GROUP_BY' by using 'ANY_VALUE()' to refer to the nonaggregated column.

The following discussion demonstrates functional dependence, the error message MySQL produces when functional dependence is absent, and ways of causing MySQL to accept a query in the absence of functional dependence.

This query might be invalid with 'ONLY_FULL_GROUP_BY' enabled because the nonaggregated 'address' column in the select list is not named in the 'GROUP BY' clause:

 SELECT name, address, MAX(age) FROM t GROUP BY name;

The query is valid if 'name' is a primary key of 't' or is a unique 'NOT NULL' column. In such cases, MySQL recognizes that the selected column is functionally dependent on a grouping column. For example, if 'name' is a primary key, its value determines the value of 'address' because each group has only one value of the primary key and thus only one row. As a result, there is no randomness in the choice of 'address' value in a group and no need to reject the query.

The query is invalid if 'name' is not a primary key of 't' or a unique 'NOT NULL' column. In this case, no functional dependency can be inferred and an error occurs:

 mysql> SELECT name, address, MAX(age) FROM t GROUP BY name;
 ERROR 1055 (42000): Expression #2 of SELECT list is not in GROUP
 BY clause and contains nonaggregated column 'mydb.t.address' which
 is not functionally dependent on columns in GROUP BY clause; this
 is incompatible with sql_mode=only_full_group_by

If you know that, for a given data set, each 'name' value in fact uniquely determines the 'address' value, 'address' is effectively functionally dependent on 'name'. To tell MySQL to accept the query, you can use the 'ANY_VALUE()' function:

 SELECT name, ANY_VALUE(address), MAX(age) FROM t GROUP BY name;

Alternatively, disable 'ONLY_FULL_GROUP_BY'.

The preceding example is quite simple, however. In particular, it is unlikely you would group on a single primary key column because every group would contain only one row. For addtional examples demonstrating functional dependence in more complex queries, see *note group-by-functional-dependence::.

If a query has aggregate functions and no 'GROUP BY' clause, it cannot have nonaggregated columns in the select list, 'HAVING' condition, or 'ORDER BY' list with 'ONLY_FULL_GROUP_BY' enabled:

 mysql> SELECT name, MAX(age) FROM t;
 ERROR 1140 (42000): In aggregated query without GROUP BY, expression
 #1 of SELECT list contains nonaggregated column 'mydb.t.name'; this
 is incompatible with sql_mode=only_full_group_by

Without 'GROUP BY', there is a single group and it is nondeterministic which 'name' value to choose for the group. Here, too, 'ANY_VALUE()' can be used, if it is immaterial which 'name' value MySQL chooses:

 SELECT ANY_VALUE(name), MAX(age) FROM t;

In MySQL 5.7.5 and later, 'ONLY_FULL_GROUP_BY' also affects handling of queries that use 'DISTINCT' and 'ORDER BY'. Consider the case of a table 't' with three columns 'c1', 'c2', and 'c3' that contains these rows:

 c1 c2 c3
 1  2  A
 3  4  B
 1  2  C

Suppose that we execute the following query, expecting the results to be ordered by 'c3':

 SELECT DISTINCT c1, c2 FROM t ORDER BY c3;

To order the result, duplicates must be eliminated first. But to do so, should we keep the first row or the third? This arbitrary choice influences the retained value of 'c3', which in turn influences ordering and makes it arbitrary as well. To prevent this problem, a query that has 'DISTINCT' and 'ORDER BY' is rejected as invalid if any 'ORDER BY' expression does not satisfy at least one of these conditions:

Another MySQL extension to standard SQL permits references in the 'HAVING' clause to aliased expressions in the select list. For example, the following query returns 'name' values that occur only once in table 'orders':

 SELECT name, COUNT(name) FROM orders
   GROUP BY name
   HAVING COUNT(name) = 1;

The MySQL extension permits the use of an alias in the 'HAVING' clause for the aggregated column:

 SELECT name, COUNT(name) AS c FROM orders
   GROUP BY name
   HAVING c = 1;

Note:

Before MySQL 5.7.5, enabling 'ONLY_FULL_GROUP_BY' disables this extension, thus requiring the 'HAVING' clause to be written using unaliased expressions.

Standard SQL permits only column expressions in 'GROUP BY' clauses, so a statement such as this is invalid because 'FLOOR(value/100)' is a noncolumn expression:

 SELECT id, FLOOR(value/100)
   FROM TBL_NAME
   GROUP BY id, FLOOR(value/100);

MySQL extends standard SQL to permit noncolumn expressions in 'GROUP BY' clauses and considers the preceding statement valid.

Standard SQL also does not permit aliases in 'GROUP BY' clauses. MySQL extends standard SQL to permit aliases, so another way to write the query is as follows:

 SELECT id, FLOOR(value/100) AS val
   FROM TBL_NAME
   GROUP BY id, val;

The alias 'val' is considered a column expression in the 'GROUP BY' clause.

In the presence of a noncolumn expression in the 'GROUP BY' clause, MySQL recognizes equality between that expression and expressions in the select list. This means that with 'ONLY_FULL_GROUP_BY' SQL mode enabled, the query containing 'GROUP BY id, FLOOR(value/100)' is valid because that same 'FLOOR()' expression occurs in the select list. However, MySQL does not try to recognize functional dependence on 'GROUP BY' noncolumn expressions, so the following query is invalid with 'ONLY_FULL_GROUP_BY' enabled, even though the third selected expression is a simple formula of the 'id' column and the 'FLOOR()' expression in the 'GROUP BY' clause:

 SELECT id, FLOOR(value/100), id+FLOOR(value/100)
   FROM TBL_NAME
   GROUP BY id, FLOOR(value/100);

A workaround is to use a derived table:

 SELECT id, F, id+F
   FROM
     (SELECT id, FLOOR(value/100) AS F
      FROM TBL_NAME
      GROUP BY id, FLOOR(value/100)) AS dt;

 File: manual.info.tmp, Node: group-by-functional-dependence, Prev: group-by-handling, Up: aggregate-functions-and-modifiers

12.19.4 Detection of Functional Dependence

The following discussion provides several examples of the ways in which MySQL detects functional dependencies. The examples use this notation:

 {X} -> {Y}

Understand this as 'X uniquely determines Y,' which also means that Y is functionally dependent on X.

The examples use the 'world' database, which can be downloaded from https://dev.mysql.com/doc/index-other.html. You can find details on how to install the database on the same page.

Functional Dependencies Derived from Keys

The following query selects, for each country, a count of spoken languages:

 SELECT co.Name, COUNT(*)
 FROM countrylanguage cl, country co
 WHERE cl.CountryCode = co.Code
 GROUP BY co.Code;

'co.Code' is a primary key of 'co', so all columns of 'co' are functionally dependent on it, as expressed using this notation:

 {co.Code} -> {co.*}

Thus, 'co.name' is functionally dependent on 'GROUP BY' columns and the query is valid.

A 'UNIQUE' index over a 'NOT NULL' column could be used instead of a primary key and the same functional dependence would apply. (This is not true for a 'UNIQUE' index that permits 'NULL' values because it permits multiple 'NULL' values and in that case uniqueness is lost.)

Functional Dependencies Derived from Multiple-Column Keys and from Equalities

This query selects, for each country, a list of all spoken languages and how many people speak them:

 SELECT co.Name, cl.Language,
 cl.Percentage * co.Population / 100.0 AS SpokenBy
 FROM countryLanguage cl, country co
 WHERE cl.CountryCode = co.Code
 GROUP BY cl.CountryCode, cl.Language;

The pair ('cl.CountryCode', 'cl.Language') is a two-column composite primary key of 'cl', so that column pair uniquely determines all columns of 'cl':

 {cl.CountryCode, cl.Language} -> {cl.*}

Moreover, because of the equality in the 'WHERE' clause:

 {cl.CountryCode} -> {co.Code}

And, because 'co.Code' is primary key of 'co':

 {co.Code} -> {co.*}

'Uniquely determines' relationships are transitive, therefore:

 {cl.CountryCode, cl.Language} -> {cl.*,co.*}

As a result, the query is valid.

As with the previous example, a 'UNIQUE' key over 'NOT NULL' columns could be used instead of a primary key.

An 'INNER JOIN' condition can be used instead of 'WHERE'. The same functional dependencies apply:

 SELECT co.Name, cl.Language,
 cl.Percentage * co.Population/100.0 AS SpokenBy
 FROM countrylanguage cl INNER JOIN country co
 ON cl.CountryCode = co.Code
 GROUP BY cl.CountryCode, cl.Language;

Functional Dependency Special Cases

Whereas an equality test in a 'WHERE' condition or 'INNER JOIN' condition is symmetric, an equality test in an outer join condition is not, because tables play different roles.

Assume that referential integrity has been accidentally broken and there exists a row of 'countrylanguage' without a corresponding row in 'country'. Consider the same query as in the previous example, but with a 'LEFT JOIN':

 SELECT co.Name, cl.Language,
 cl.Percentage * co.Population/100.0 AS SpokenBy
 FROM countrylanguage cl LEFT JOIN country co
 ON cl.CountryCode = co.Code
 GROUP BY cl.CountryCode, cl.Language;

For a given value of 'cl.CountryCode', the value of 'co.Code' in the join result is either found in a matching row (determined by 'cl.CountryCode') or is 'NULL'-complemented if there is no match (also determined by 'cl.CountryCode'). In each case, this relationship applies:

 {cl.CountryCode} -> {co.Code}

'cl.CountryCode' is itself functionally dependent on {'cl.CountryCode', 'cl.Language'} which is a primary key.

If in the join result 'co.Code' is 'NULL'-complemented, 'co.Name' is as well. If 'co.Code' is not 'NULL'-complemented, then because 'co.Code' is a primary key, it determines 'co.Name'. Therefore, in all cases:

 {co.Code} -> {co.Name}

Which yields:

 {cl.CountryCode, cl.Language} -> {cl.*,co.*}

As a result, the query is valid.

However, suppose that the tables are swapped, as in this query:

 SELECT co.Name, cl.Language,
 cl.Percentage * co.Population/100.0 AS SpokenBy
 FROM country co LEFT JOIN countrylanguage cl
 ON cl.CountryCode = co.Code
 GROUP BY cl.CountryCode, cl.Language;

Now this relationship does not apply:

 {cl.CountryCode, cl.Language} -> {cl.*,co.*}

All 'NULL'-complemented rows made for 'cl' are put into a single group (they have both 'GROUP BY' columns equal to 'NULL'), and inside this group the value of 'co.Name' can vary. The query is invalid and MySQL rejects it.

Functional dependence in outer joins is thus linked to whether determinant columns belong to the left or right side of the 'LEFT JOIN'. Determination of functional dependence becomes more complex if there are nested outer joins or the join condition does not consist entirely of equality comparisons.

Functional Dependencies and Views

Suppose that a view on countries produces their code, their name in uppercase, and how many different official languages they have:

 CREATE VIEW country2 AS
 SELECT co.Code, UPPER(co.Name) AS UpperName,
 COUNT(cl.Language) AS OfficialLanguages
 FROM country AS co JOIN countrylanguage AS cl
 ON cl.CountryCode = co.Code
 WHERE cl.isOfficial = 'T'
 GROUP BY co.Code;

This definition is valid because:

 {co.Code} -> {co.*}

In the view result, the first selected column is 'co.Code', which is also the group column and thus determines all other selected expressions:

 {country2.Code} -> {country2.*}

MySQL understands this and uses this information, as described following.

This query displays countries, how many different official languages they have, and how many cities they have, by joining the view with the 'city' table:

 SELECT co2.Code, co2.UpperName, co2.OfficialLanguages,
 COUNT(*) AS Cities
 FROM country2 AS co2 JOIN city ci
 ON ci.CountryCode = co2.Code
 GROUP BY co2.Code;

This query is valid because, as seen previously:

 {co2.Code} -> {co2.*}

MySQL is able to discover a functional dependency in the result of a view and use that to validate a query which uses the view. The same would be true if 'country2' were a derived table, as in:

 SELECT co2.Code, co2.UpperName, co2.OfficialLanguages,
 COUNT(*) AS Cities
 FROM
 (
  SELECT co.Code, UPPER(co.Name) AS UpperName,
  COUNT(cl.Language) AS OfficialLanguages
  FROM country AS co JOIN countrylanguage AS cl
  ON cl.CountryCode=co.Code
  WHERE cl.isOfficial='T'
  GROUP BY co.Code
 ) AS co2
 JOIN city ci ON ci.CountryCode = co2.Code
 GROUP BY co2.Code;

Combinations of Functional Dependencies

MySQL is able to combine all of the preceding types of functional dependencies (key based, equality based, view based) to validate more complex queries.

 File: manual.info.tmp, Node: miscellaneous-functions, Next: precision-math, Prev: aggregate-functions-and-modifiers, Up: functions