Aggregation & Window Functions¶
An aggregation function can be used to combine data across many
rows to form a single answer. Aggregations can be done with a
GROUP BY clause, in which case the combined value is
calculated once per unique combination of groupbing keys. Aggregations
can also be done without the GROUP BY clause, in which case
a single value is outputted by calculating the aggregation across
all rows.
For example:
Window functions can be used to compute an aggregation across a row and its surrounding rows. Most window functions have the following syntax:
SELECT WINDOW_FN(ARG1, ..., ARGN) OVER (PARTITION BY PARTITION_COLUMN_1, ..., PARTITION_COLUMN_N ORDER BY SORT_COLUMN_1, ..., SORT_COLUMN_N ROWS BETWEEN <LOWER_BOUND> AND <UPPER_BOUND>) FROM table_name
ROWS BETWEEN ROWS BETWEEN <LOWER_BOUND> AND <UPPER_BOUND>
section is used to specify the window over which to compute the
function. A bound can can come before the current row, using PRECEDING or after the current row, using
FOLLOWING. The bounds can be relative (i.e.
N PRECEDING or N FOLLOWING), where N is a positive integer,
or they can be absolute (i.e. UNBOUNDED PRECEDING or
UNBOUNDED FOLLOWING).
For example:
This query computes the sum of every 3 rows, i.e. the sum of a row of interest, its preceding row, and its following row.In contrast:
SELECT SUM(A) OVER (PARTITION BY B ORDER BY C ROWS BETWEEN UNBOUNDED PRECEDING AND 0 FOLLOWING) FROM table1
Note
For most window functions, BodoSQL returns NULL if the specified window frame
is empty or all NULL. Exceptions to this behavior are noted.
Window functions perform a series of steps as followed:
- Partition the data by
PARTITION_COLUMN. This is effectively a groupby operation onPARTITION_COLUMN. - Sort each group as specified by the
ORDER BYclause. - Perform the calculation over the specified window, i.e. the newly ordered subset of data.
- Shuffle the data back to the original ordering.
For BodoSQL, PARTITION BY is required, but
ORDER BY is optional for most functions and
ROWS BETWEEN is optional for all of them. If
ROWS BETWEEN is not specified then it defaults to either
computing the result over the entire window (if no ORDER BY
clause is specified) or to using the window UNBOUNDED PRECEDING TO CURRENT ROW
(if there is an ORDER BY clause).
Note
RANGE BETWEEN is not currently supported.
Note
If a window frame contains NaN values, the output may diverge from Snowflake's
behavior. When a NaN value enters a window, any window function that combines
the results with arithmetic (e.g. SUM, AVG, VARIANCE, etc.) will output
NaN until the NaN value has exited the window.
BodoSQL Currently supports the following Aggregation & Window functions:
| Function | Supported with GROUP BY? | Supported without GROUP BY? | Supported as window function? | (WINDOW) Allows ORDER BY? | (WINDOW) Requires ORDER BY? | (WINDOW) Allows frame? |
|---|---|---|---|---|---|---|
ANY_VALUE |
Y | Y | Y | Y | N | Y |
APPROX_PERCENTILE |
N | Y | Y | N | N | N |
ARRAY_AGG |
Y | N | N | N/A | N/A | N/A |
ARRAY_UNIQUE_AGG |
Y | N | N | N/A | N/A | N/A |
AVG |
Y | Y | Y | Y | N | Y |
BITAND_AGG |
Y | Y | Y | N | N | N |
BITOR_AGG |
Y | Y | Y | N | N | N |
BITXOR_AGG |
Y | Y | Y | N | N | N |
BOOLAND_AGG |
Y | Y | Y | N | N | N |
BOOLOR_AGG |
Y | Y | Y | N | N | N |
BOOLXOR_AGG |
Y | Y | Y | N | N | N |
CONDITIONAL_CHANGE_EVENT |
N | N | Y | Y | Y | N |
CONDITIONAL_TRUE_EVENT |
N | N | Y | Y | Y | N |
CORR |
N | N | Y | N | N | N |
COUNT |
Y | Y | Y | Y | N | Y |
COUNT(*) |
Y | Y | Y | Y | N | Y |
COUNT_IF |
Y | Y | Y | Y | N | Y |
COVAR_POP |
N | N | Y | N | N | N |
COVAR_SAMP |
N | N | Y | N | N | N |
CUME_DIST |
N | N | Y | Y | Y | N |
DENSE_RANK |
N | N | Y | Y | Y | N |
FIRST_VALUE |
N | N | Y | Y | N | Y |
KURTOSIS |
Y | Y | Y | N | N | N |
LEAD |
N | N | Y | Y | Y | N |
LAST_VALUE |
N | N | Y | Y | N | Y |
LAG |
N | N | Y | Y | Y | N |
LISTAGG |
Y | Y | N | N/A | N/A | N/A |
MAX |
Y | Y | Y | Y | N | Y |
MEDIAN |
Y | Y | Y | N | N | N |
MIN |
Y | Y | Y | Y | N | Y |
MODE |
Y | N | Y | Y | N | N |
NTH_VALUE |
N | N | Y | Y | N | Y |
NTILE |
N | N | Y | Y | Y | N |
OBJECT_AGG |
Y | N | Y | N | N | N |
PERCENTILE_CONT |
Y | Y | N | N/A | N/A | N/A |
PERCENTILE_DISC |
Y | Y | N | N/A | N/A | N/A |
PERCENT_RANK |
N | N | Y | Y | Y | N |
RANK |
N | N | Y | Y | Y | N |
RATIO_TO_REPORT |
N | N | Y | Y | N | N |
ROW_NUMBER |
N | N | Y | Y | Y | N |
SKEW |
Y | Y | Y | Y | N | N |
STDDEV |
Y | Y | Y | Y | N | Y |
STDDEV_POP |
Y | Y | Y | Y | N | Y |
STDDEV_SAMP |
Y | Y | Y | Y | N | Y |
SUM |
Y | Y | Y | Y | N | Y |
VARIANCE |
Y | Y | Y | Y | N | Y |
VARIANCE_POP |
Y | Y | Y | Y | N | Y |
VARIANCE_SAMP |
Y | Y | Y | Y | N | Y |
VAR_POP |
Y | Y | Y | Y | N | Y |
VAR_SAMP |
Y | Y | Y | Y | N | Y |