Aggregation & Window Functions¶
An aggregation function can be used to combine data across many
rows to form a single answer. Aggregations can be done with a
GROUP BY
clause, in which case the combined value is
calculated once per unique combination of groupbing keys. Aggregations
can also be done without the GROUP BY
clause, in which case
a single value is outputted by calculating the aggregation across
all rows.
For example:
Window functions can be used to compute an aggregation across a row and its surrounding rows. Most window functions have the following syntax:
SELECT WINDOW_FN(ARG1, ..., ARGN) OVER (PARTITION BY PARTITION_COLUMN_1, ..., PARTITION_COLUMN_N ORDER BY SORT_COLUMN_1, ..., SORT_COLUMN_N ROWS BETWEEN <LOWER_BOUND> AND <UPPER_BOUND>) FROM table_name
ROWS BETWEEN ROWS BETWEEN <LOWER_BOUND> AND <UPPER_BOUND>
section is used to specify the window over which to compute the
function. A bound can can come before the current row, using PRECEDING
or after the current row, using
FOLLOWING
. The bounds can be relative (i.e.
N PRECEDING
or N FOLLOWING
), where N
is a positive integer,
or they can be absolute (i.e. UNBOUNDED PRECEDING
or
UNBOUNDED FOLLOWING
).
For example:
This query computes the sum of every 3 rows, i.e. the sum of a row of interest, its preceding row, and its following row.In contrast:
SELECT SUM(A) OVER (PARTITION BY B ORDER BY C ROWS BETWEEN UNBOUNDED PRECEDING AND 0 FOLLOWING) FROM table1
Note
For most window functions, BodoSQL returns NULL
if the specified window frame
is empty or all NULL
. Exceptions to this behavior are noted.
Window functions perform a series of steps as followed:
- Partition the data by
PARTITION_COLUMN
. This is effectively a groupby operation onPARTITION_COLUMN
. - Sort each group as specified by the
ORDER BY
clause. - Perform the calculation over the specified window, i.e. the newly ordered subset of data.
- Shuffle the data back to the original ordering.
For BodoSQL, PARTITION BY
is required, but
ORDER BY
is optional for most functions and
ROWS BETWEEN
is optional for all of them. If
ROWS BETWEEN
is not specified then it defaults to either
computing the result over the entire window (if no ORDER BY
clause is specified) or to using the window UNBOUNDED PRECEDING TO CURRENT ROW
(if there is an ORDER BY
clause).
Note
RANGE BETWEEN
is not currently supported.
Note
If a window frame contains NaN
values, the output may diverge from Snowflake's
behavior. When a NaN
value enters a window, any window function that combines
the results with arithmetic (e.g. SUM
, AVG
, VARIANCE
, etc.) will output
NaN
until the NaN
value has exited the window.
BodoSQL Currently supports the following Aggregation & Window functions:
Function | Supported with GROUP BY? | Supported without GROUP BY? | Supported as window function? | (WINDOW) Allows ORDER BY? | (WINDOW) Requires ORDER BY? | (WINDOW) Allows frame? |
---|---|---|---|---|---|---|
ANY_VALUE |
Y | Y | Y | Y | N | Y |
APPROX_PERCENTILE |
N | Y | Y | N | N | N |
ARRAY_AGG |
Y | N | N | N/A | N/A | N/A |
ARRAY_UNIQUE_AGG |
Y | N | N | N/A | N/A | N/A |
AVG |
Y | Y | Y | Y | N | Y |
BITAND_AGG |
Y | Y | Y | N | N | N |
BITOR_AGG |
Y | Y | Y | N | N | N |
BITXOR_AGG |
Y | Y | Y | N | N | N |
BOOLAND_AGG |
Y | Y | Y | N | N | N |
BOOLOR_AGG |
Y | Y | Y | N | N | N |
BOOLXOR_AGG |
Y | Y | Y | N | N | N |
CONDITIONAL_CHANGE_EVENT |
N | N | Y | Y | Y | N |
CONDITIONAL_TRUE_EVENT |
N | N | Y | Y | Y | N |
CORR |
N | N | Y | N | N | N |
COUNT |
Y | Y | Y | Y | N | Y |
COUNT(*) |
Y | Y | Y | Y | N | Y |
COUNT_IF |
Y | Y | Y | Y | N | Y |
COVAR_POP |
N | N | Y | N | N | N |
COVAR_SAMP |
N | N | Y | N | N | N |
CUME_DIST |
N | N | Y | Y | Y | N |
DENSE_RANK |
N | N | Y | Y | Y | N |
FIRST_VALUE |
N | N | Y | Y | N | Y |
KURTOSIS |
Y | Y | Y | N | N | N |
LEAD |
N | N | Y | Y | Y | N |
LAST_VALUE |
N | N | Y | Y | N | Y |
LAG |
N | N | Y | Y | Y | N |
LISTAGG |
Y | Y | N | N/A | N/A | N/A |
MAX |
Y | Y | Y | Y | N | Y |
MEDIAN |
Y | Y | Y | N | N | N |
MIN |
Y | Y | Y | Y | N | Y |
MODE |
Y | N | Y | Y | N | N |
NTH_VALUE |
N | N | Y | Y | N | Y |
NTILE |
N | N | Y | Y | Y | N |
OBJECT_AGG |
Y | N | Y | N | N | N |
PERCENTILE_CONT |
Y | Y | N | N/A | N/A | N/A |
PERCENTILE_DISC |
Y | Y | N | N/A | N/A | N/A |
PERCENT_RANK |
N | N | Y | Y | Y | N |
RANK |
N | N | Y | Y | Y | N |
RATIO_TO_REPORT |
N | N | Y | Y | N | N |
ROW_NUMBER |
N | N | Y | Y | Y | N |
SKEW |
Y | Y | Y | Y | N | N |
STDDEV |
Y | Y | Y | Y | N | Y |
STDDEV_POP |
Y | Y | Y | Y | N | Y |
STDDEV_SAMP |
Y | Y | Y | Y | N | Y |
SUM |
Y | Y | Y | Y | N | Y |
VARIANCE |
Y | Y | Y | Y | N | Y |
VARIANCE_POP |
Y | Y | Y | Y | N | Y |
VARIANCE_SAMP |
Y | Y | Y | Y | N | Y |
VAR_POP |
Y | Y | Y | Y | N | Y |
VAR_SAMP |
Y | Y | Y | Y | N | Y |