Skip to content

Aggregation & Window Functions

An aggregation function can be used to combine data across many rows to form a single answer. Aggregations can be done with a GROUP BY clause, in which case the combined value is calculated once per unique combination of groupbing keys. Aggregations can also be done without the GROUP BY clause, in which case a single value is outputted by calculating the aggregation across all rows.

For example:

SELECT AVG(A) FROM table1 GROUP BY B

SELECT COUNT(Distinct A) FROM table1

Window functions can be used to compute an aggregation across a row and its surrounding rows. Most window functions have the following syntax:

SELECT WINDOW_FN(ARG1, ..., ARGN) OVER (PARTITION BY PARTITION_COLUMN_1, ..., PARTITION_COLUMN_N ORDER BY SORT_COLUMN_1, ..., SORT_COLUMN_N ROWS BETWEEN <LOWER_BOUND> AND <UPPER_BOUND>) FROM table_name
The ROWS BETWEEN ROWS BETWEEN <LOWER_BOUND> AND <UPPER_BOUND> section is used to specify the window over which to compute the function. A bound can can come before the current row, using PRECEDING or after the current row, using FOLLOWING. The bounds can be relative (i.e. N PRECEDING or N FOLLOWING), where N is a positive integer, or they can be absolute (i.e. UNBOUNDED PRECEDING or UNBOUNDED FOLLOWING).

For example:

SELECT SUM(A) OVER (PARTITION BY B ORDER BY C ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) FROM table1
This query computes the sum of every 3 rows, i.e. the sum of a row of interest, its preceding row, and its following row.

In contrast:

SELECT SUM(A) OVER (PARTITION BY B ORDER BY C ROWS BETWEEN UNBOUNDED PRECEDING AND 0 FOLLOWING) FROM table1
This query computes the cumulative sum over a row and all of its preceding rows.

Note

For most window functions, BodoSQL returns NULL if the specified window frame is empty or all NULL. Exceptions to this behavior are noted.

Window functions perform a series of steps as followed:

  1. Partition the data by PARTITION_COLUMN. This is effectively a groupby operation on PARTITION_COLUMN.
  2. Sort each group as specified by the ORDER BY clause.
  3. Perform the calculation over the specified window, i.e. the newly ordered subset of data.
  4. Shuffle the data back to the original ordering.

For BodoSQL, PARTITION BY is required, but ORDER BY is optional for most functions and ROWS BETWEEN is optional for all of them. If ROWS BETWEEN is not specified then it defaults to either computing the result over the entire window (if no ORDER BY clause is specified) or to using the window UNBOUNDED PRECEDING TO CURRENT ROW (if there is an ORDER BY clause).

Note

RANGE BETWEEN is not currently supported.

Note

If a window frame contains NaN values, the output may diverge from Snowflake's behavior. When a NaN value enters a window, any window function that combines the results with arithmetic (e.g. SUM, AVG, VARIANCE, etc.) will output NaN until the NaN value has exited the window.

BodoSQL Currently supports the following Aggregation & Window functions:

Function Supported with GROUP BY? Supported without GROUP BY? Supported as window function? (WINDOW) Allows ORDER BY? (WINDOW) Requires ORDER BY? (WINDOW) Allows frame?
ANY_VALUE Y Y Y Y N Y
APPROX_PERCENTILE N Y Y N N N
ARRAY_AGG Y N N N/A N/A N/A
ARRAY_UNIQUE_AGG Y N N N/A N/A N/A
AVG Y Y Y Y N Y
BITAND_AGG Y Y Y N N N
BITOR_AGG Y Y Y N N N
BITXOR_AGG Y Y Y N N N
BOOLAND_AGG Y Y Y N N N
BOOLOR_AGG Y Y Y N N N
BOOLXOR_AGG Y Y Y N N N
CONDITIONAL_CHANGE_EVENT N N Y Y Y N
CONDITIONAL_TRUE_EVENT N N Y Y Y N
CORR N N Y N N N
COUNT Y Y Y Y N Y
COUNT(*) Y Y Y Y N Y
COUNT_IF Y Y Y Y N Y
COVAR_POP N N Y N N N
COVAR_SAMP N N Y N N N
CUME_DIST N N Y Y Y N
DENSE_RANK N N Y Y Y N
FIRST_VALUE N N Y Y N Y
KURTOSIS Y Y Y N N N
LEAD N N Y Y Y N
LAST_VALUE N N Y Y N Y
LAG N N Y Y Y N
LISTAGG Y Y N N/A N/A N/A
MAX Y Y Y Y N Y
MEDIAN Y Y Y N N N
MIN Y Y Y Y N Y
MODE Y N Y Y N N
NTH_VALUE N N Y Y N Y
NTILE N N Y Y Y N
OBJECT_AGG Y N Y N N N
PERCENTILE_CONT Y Y N N/A N/A N/A
PERCENTILE_DISC Y Y N N/A N/A N/A
PERCENT_RANK N N Y Y Y N
RANK N N Y Y Y N
RATIO_TO_REPORT N N Y Y N N
ROW_NUMBER N N Y Y Y N
SKEW Y Y Y Y N N
STDDEV Y Y Y Y N Y
STDDEV_POP Y Y Y Y N Y
STDDEV_SAMP Y Y Y Y N Y
SUM Y Y Y Y N Y
VARIANCE Y Y Y Y N Y
VARIANCE_POP Y Y Y Y N Y
VARIANCE_SAMP Y Y Y Y N Y
VAR_POP Y Y Y Y N Y
VAR_SAMP Y Y Y Y N Y