Aggregation & Window Functions¶
An aggregation function can be used to combine data across many
rows to form a single answer. Aggregations can be done with a
GROUP BY
clause, in which case the combined value is
calculated once per unique combination of groupbing keys. Aggregations
can also be done without the GROUP BY
clause, in which case
a single value is outputted by calculating the aggregation across
all rows.
For example:
Window functions can be used to compute an aggregation across a row and its surrounding rows. Most window functions have the following syntax:
SELECT WINDOW_FN(ARG1, ..., ARGN) OVER (PARTITION BY PARTITION_COLUMN_1, ..., PARTITION_COLUMN_N ORDER BY SORT_COLUMN_1, ..., SORT_COLUMN_N ROWS BETWEEN <LOWER_BOUND> AND <UPPER_BOUND>) FROM table_name
ROWS BETWEEN ROWS BETWEEN <LOWER_BOUND> AND <UPPER_BOUND>
section is the window's "frame" used to specify the window over which to compute the
function. A bound can can come before the current row, using PRECEDING
or after the current row, using
FOLLOWING
. The bounds can be relative (i.e.
N PRECEDING
or N FOLLOWING
), where N
is a positive integer,
or they can be absolute (i.e. UNBOUNDED PRECEDING
or
UNBOUNDED FOLLOWING
).
For example, consider the following window function calls:
SELECT
SUM(A) OVER () as S1,
SUM(A) OVER (PARTITION BY B) as S2,
SUM(A) OVER (PARTITION BY B ORDER BY C ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as S3,
SUM(A) OVER (ORDER BY C ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) as S4,
FROM table1
S1
: The sum of each row ofA
across the entire table.S2
: The sum of all values ofA
within each partition ofB
.S3
: The cumulative sum of each row and all rows before it when the rows are partitioned by B the sorted by C.S4
: The the sum each row with the row before & after it in the entire table when ordered by C.
Note
For most window functions, BodoSQL returns NULL
if the specified window frame
is empty or all NULL
. Exceptions to this behavior are noted.
All window functions optionally allow PARTITION BY
. Some window functions optionally allow ORDER BY
, and some may actually require it. Some window functions optionally allow window frames, and others ban it. If a function supports window frames but one is not provided, the default frame behavior depends on the function.
Note
RANGE BETWEEN
is not currently supported.
Note
If a window frame contains NaN
values, the output may diverge from Snowflake's
behavior. When a NaN
value enters a window, any window function that combines
the results with arithmetic (e.g. SUM
, AVG
, VARIANCE
, etc.) will output
NaN
until the NaN
value has exited the window.
BodoSQL Currently supports the following Aggregation & Window functions:
Function | Supported with GROUP BY? | Supported without GROUP BY? | Supported as window function? | (WINDOW) Requires ORDER BY? | (WINDOW) Allows frame? |
---|---|---|---|---|---|
ANY_VALUE |
Y | Y | Y | N | N |
APPROX_PERCENTILE |
N | Y | Y | N | N |
ARRAY_AGG |
Y | N | N | N/A | N/A |
ARRAY_UNIQUE_AGG |
Y | N | N | N/A | N/A |
AVG |
Y | Y | Y | N | Y |
BITAND_AGG |
Y | Y | Y | N | N |
BITOR_AGG |
Y | Y | Y | N | N |
BITXOR_AGG |
Y | Y | Y | N | N |
BOOLAND_AGG |
Y | Y | Y | N | N |
BOOLOR_AGG |
Y | Y | Y | N | N |
BOOLXOR_AGG |
Y | Y | Y | N | N |
CONDITIONAL_CHANGE_EVENT |
N | N | Y | Y | N |
CONDITIONAL_TRUE_EVENT |
N | N | Y | Y | N |
CORR |
N | N | Y | N | N |
COUNT |
Y | Y | Y | N | Y |
COUNT(*) |
Y | Y | Y | N | Y |
COUNT_IF |
Y | Y | Y | N | Y |
COVAR_POP |
N | N | Y | N | N |
COVAR_SAMP |
N | N | Y | N | N |
CUME_DIST |
N | N | Y | Y | N |
DENSE_RANK |
N | N | Y | Y | N |
FIRST_VALUE |
N | N | Y | N | Y |
KURTOSIS |
Y | Y | Y | N | N |
LEAD |
N | N | Y | Y | N |
LAST_VALUE |
N | N | Y | N | Y |
LAG |
N | N | Y | Y | N |
LISTAGG |
Y | Y | N | N/A | N/A |
MAX |
Y | Y | Y | N | Y |
MEDIAN |
Y | Y | Y | N | N |
MIN |
Y | Y | Y | N | Y |
MODE |
Y | N | Y | N | N |
NTH_VALUE |
N | N | Y | N | Y |
NTILE |
N | N | Y | Y | N |
OBJECT_AGG |
Y | N | Y | N | N |
PERCENTILE_CONT |
Y | Y | N | N/A | N/A |
PERCENTILE_DISC |
Y | Y | N | N/A | N/A |
PERCENT_RANK |
N | N | Y | Y | N |
RANK |
N | N | Y | Y | N |
RATIO_TO_REPORT |
N | N | Y | N | N |
ROW_NUMBER |
N | N | Y | Y | N |
SKEW |
Y | Y | Y | N | N |
STDDEV |
Y | Y | Y | N | Y |
STDDEV_POP |
Y | Y | Y | N | Y |
STDDEV_SAMP |
Y | Y | Y | N | Y |
SUM |
Y | Y | Y | N | Y |
VARIANCE |
Y | Y | Y | N | Y |
VARIANCE_POP |
Y | Y | Y | N | Y |
VARIANCE_SAMP |
Y | Y | Y | N | Y |
VAR_POP |
Y | Y | Y | N | Y |
VAR_SAMP |
Y | Y | Y | N | Y |