AggregateBy function
The AggregateBy function aggregates the values in a time series using a new time interval that you specify by providing a calendar.
This function can be used to convert a time series with a small interval to a time series with a larger interval: for instance, to produce a weekly time series from a daily time series.
If you supply the optional start and end DATETIME parameters, just that part of the time series is aggregated to the new time interval.
Syntax
AggregateBy(agg_express lvarchar,
cal_name lvarchar,
ts TimeSeries
flags integer default 0
start datetime year to fraction(5) default NULL,
end datetime year to fraction(5) default NULL
)
returns TimeSeries;
- agg_express
- A comma-separated list of these SQL aggregate operators: MIN, MAX, MEDIAN, SUM, AVG, FIRST, LAST, or Nth. The MIN, MAX, MEDIAN, SUM, and AVG expressions can operate only on numeric columns.
- cal_name
- The name of a calendar that defines the aggregation period.
- ts
- The time series to be aggregated.
- flags (optional)
- Determines how data points in off periods of calendars are handled during aggregation. See The flags argument values.
- start (optional)
- The date and time at which to start aggregation.
- end (optional)
- The date and time at which to end aggregation.
Description
The AggregateBy function converts the input time series to a regular time series with a calendar given by the cal_name argument.
The agg_express expressions operate on a column of the input time series, which is specified by one of the following column identifiers:
- $colname
- The colname is the name of the column to aggregate
in the TimeSeries data type. For example, if the column name
is high, the column identifier is
$high
. - $colnumber
- The colnumber is the position of the column
to aggregate in the TimeSeries data type. For example if the
column number is 1, the column identifier is
$1
. - $bson_field_name
- The bson_field_name is the name of a field
in at least one BSON document in the BSON column in the TimeSeries data
type. For example, if the field name is v1, the column identifier
is
$v1
. If the BSON field name is the same as another column in the TimeSeries data type, you must qualify the field name in one of the following ways:- $colname.bson_field_name
For example, if the BSON column name is b_data and the field name is v1, the column identifier is
$b_data.v1
. - $colnumber.bson_field_name
For example, if the BSON column number is 1 and the field name is v1, the column identifier is
$1.v1
.
You must cast the results of the AggregateBy function on a BSON field to a TimeSeries data type that has the appropriate type of columns for the result of the expression.
- $colname.bson_field_name
Nth($col, n)
- $col
- The column identifier.
- n
- A positive or negative number that indicates the position of the TimeSeries row
within the aggregation period. Positive values of n begin
at the first row in the aggregation period; therefore, Nth($col, 1)
is equivalent to FIRST($col). Negative values of n begin
with the last row in the aggregation period; therefore, Nth($col,
-1) is equivalent to LAST($col).
If an aggregation period does not have a value for the nth row, then the Nth function returns a null value for that period. The Nth function is more efficient for positive values of the n argument than for negative values.
An aggregation time period is denoted by the start date and time of the period.
The origin of the aggregated output time series is the first period on or before the origin of the input time series. Each output period is the aggregation of all input periods from the start of the output period up to, but not including, the start of the next output period.
For instance, suppose you want to aggregate a daily time series that starts on Tuesday, Jan. 4, 2011, to a weekly time series. The input calendar, named "days", starts at 12:00 a.m., and the output calendar, named "weeks", starts at 12:00 a.m., on Monday.
The first output time is 00:00 Jan. 3, 2011; it is the aggregation of all input values from the input origin, Jan. 4, 2011, to 23:59:59.99999 Jan. 9, 2011. The second output time is 00:00 Jan. 10, 2011; it is the aggregation of all input values from 00:00 Jan 10, 2011 to 23:59:59.99999 Jan. 16, 2011.
- Converting from a regular time series to a time series with a calendar of the same granularity. In this case, AggregateBy shifts the times back to accommodate differences in the calendar start times: for example, 00:00 from 8:00. Elements can be removed or null elements added to accommodate differences in the on/off pattern.
- Converting from a regular time series to one with a calendar of finer granularity. In this case, AggregateBy replicates values.
- The input time series is irregular. Because the granularity of an irregular time series does not depend on the granularity of the calendar, this case is treated like aggregation from a fine-grained time series to a coarser-grained one. This type of aggregation always produces a regular time series.
The flags argument values
The flags argument determines how data points in the off periods of calendars are handled during aggregation and how hidden elements are managed. It can have the following values.
- 0
- (Default) Data in off periods is aggregated with the next output period.
- 1
- Data in off periods is aggregated with the previous output period.
- 2
- Indicates that the scan runs with the TS_SCAN_HIDDEN flag set (hidden elements are returned).
- 4
- Indicates that the scan runs with the TS_SCAN_SKIP_HIDDEN flag set (hidden elements are not returned).
For example, consider an input time series that has a daily calendar with no off days: it has data from weekdays and weekends. If you aggregate this data by a business-day calendar (5 days on, 2 days off, starting on a Monday), a flags argument of 0 causes weekend data to be aggregated with the next Monday's data, and a flags argument of 1 causes weekend data to be aggregated with the previous Friday's data.
'startdate(2010-1-1 00:00:00.00000), pattstart(2010-1-1 00:00:00.00000),
pattern({1 on, 2 off}, month'
If you aggregate this calendar with either a flags argument of 0 or no flags argument, all input points up to, but not including, 2010-2-1 00:00:00.00000 are aggregated into the first output element. All points from 2010-2-1 00:00:00.00000 up to, but not including, 2010-5-1 00:00:00.00000 are aggregated into the second output element, and so on.
1
, all input points up to but not including 2010-
4-1 00:00:00.00000 are aggregated into the first output element. All
points from 2010-4-1 00:00:00.00000 up to, but not including, 2010-7-1
00:00:00.00000 are aggregated into the second output element, and
so on. The AggregateBy clause might look like this: AggregateBy('max($high)', 'quarterlycal', ts, 1);
Returns
The aggregated time series, which is always regular, if you are aggregating to a new time interval. The resulting time series has a time stamp column plus one column for each expression in the list.
Examples: Stock data
insert into daily_stocks( stock_id, stock_name, stock_data)
select stock_id, stock_name,
AggregateBy( 'max($high), min($low),last($final),sum($vol)',
'weekcal', stock_data)::TimeSeries(stock_bar)
from daily_stocks;
AggregateBy( 'Nth($price, 2)', 'weekly', ts)
AggregateBy( 'Nth($price, -2)', 'weekly', ts)
Examples: BSON data
This example is based on the following row type and time series definition. The TimeSeries row type contains an INTEGER column that is named v1 and the BSON column contains a field that is also named v1.
CREATE ROW TYPE rb(timestamp datetime year to fraction(5), data bson, v1 int);
INSERT INTO tj VALUES(1,'origin(2011-01-01 00:00:00.00000), calendar(ts_15min),
container(kontainer),threshold(0), regular,[({"v1":99},20)]');
The following statement creates a TimeSeries data type to hold the results of the aggregation on the BSON field in an INTEGER column:
CREATE ROW TYPE outrow(timestamp datetime year to fraction(5), x int);
If a column and a BSON field have the same name, the column takes precedence. The following statement returns the maximum value from the v1 INTEGER column:
SELECT AggregateBy('max($v1)','ts_1year',tsdata,0
"2011-01-01 00:00:00.00000"::datetime year to fraction(5),
"2012-01-01 00:00:00.00000"::datetime year to fraction(5))
FROM tj;
The following two equivalent statements return the maximum value from the v1 field in the data BSON column, which is column 1 in the TimeSeries row type:
SELECT AggregateBy('max($data.v1)','ts_1year',tsdata,0
"2011-01-01 00:00:00.00000"::datetime year to fraction(5),
"2012-01-01 00:00:00.00000"::datetime year to fraction(5))
::timeseries(outrow)
FROM tj;
SELECT AggregateBy('max($1.v1)','ts_1year',tsdata,0
"2011-01-01 00:00:00.00000"::datetime year to fraction(5),
"2012-01-01 00:00:00.00000"::datetime year to fraction(5))
::timeseries(outrow)
FROM tj;
The aggregated time series that is returned has the TimeSeries data type outrow. If you do not cast the result to a row type that has appropriate columns for the results, the statement fails.