VARIANCE Function
The VARIANCE function returns an estimate of the population variance, as the standard deviation squared.
VARIANCE calculates the following value:
(SUM(Xi2) - (SUM(Xi)2)/N)/(N - 1)
In this formula,
X
i is each value in the column,- and
N
is the total number of non-NULL values in the column (unless all values are NULL, in which case the variance is logically undefined, and the VARIANCE function returns NULL).
You can apply the VARIANCE function only to numeric columns.
The following query estimates the variance of age values
for a population:
SELECT VARIANCE(age) FROM u_pop WHERE u_pop.age > 0;
As with the other aggregates, the VARIANCE function
applies to the rows of a group when the query includes a GROUP BY
clause, as in this example:
SELECT VARIANCE(age) FROM u_pop GROUP BY birth
WHERE VARIANCE(age) > 0;
As previously noted, VARIANCE ignores NULL values unless every qualified row is NULL for a specified column. If every value is NULL, then VARIANCE returns a NULL result for that column. (This typically indicates missing data, and is not necessarily a good estimate of underlying population variance.)
If
N
, the total number of qualifying
non-NULL column values, equals 1
, then the VARIANCE function
returns zero (another implausible estimate of the true population
variance). To omit this special case, you can modify the query. For
example, you might include a HAVING COUNT(*) > 1
clause.
Important: All calculations for the VARIANCE function
are performed in 32-digit precision, which should be sufficient for
many sets of input data. The calculation, however, loses precision
or returns incorrect results when all of the input data values have
16 or more digits of precision.
Although DATE values are stored internally as an integer, you cannot use the VARIANCE function on columns of data type DATE.