Aggregations

General

The aggregation operations are available at the level of each stage of the DataBlocks (module GenericsData) and at the level of the sources of the presentations (module HandleData) .

DataBlocks documentation - see Chapter Creating Steps

An aggregation operation calculates a single value from a collection of values. For example, an aggregation operation can be the calculation of the average daily temperature from the daily temperature values taken over a month.

Principles

An aggregation consists of one (or more) values defining the grouping of the aggregation.

As an option, the aggregation makes it possible to specify one or more calculation functions which will be applied at each level of grouping.

Among the functions are the following:

Sum
Mean
Minimum
Maximum
Median
First value
Last value
…

Place where aggregation operations are performed

Aggregate operations are available

in the GenericsData module at the level of the DataBlock function when setting up the steps (cf DataBlocks Documentation)
in the HandleData module at the level of the sources of the presentations

Access to aggregation functions

Types of aggregation

4 types of aggregation are available

Simple Aggregations
Multidimensional Aggregations
Column operations
Pivot Operations

List of available aggregations

Simple Aggregations

The configuration screen for a simple aggregation is divided into 4 zones.

Simple Aggregation

Choice of aggregation type (here, Simple) Panel containing the list of available columns for aggregation settings Information If the number of columns is large, and in order to refine the search, an input box allows to filter the available columns. Panel containing the columns defining the grouping criteria

The number of columns is not limited.
Column types (AlphaNumeric, Numeric and Date) can be mixed.

Panel containing the functions to apply to the grouping

Note that it is possible not to specify a function in the panel. In this case, only the grouping on the criteria specified in the panel will be executed without any function calculation.
The number of function is not limited.
The columns embedded in the functions cannot be the same as the columns used in the panel defining the grouping criteria .

The proposed functions are dependent on the type of column on which the function is applied (AlphaNumeric, Numeric and Date)

Some examples

Total Number: Total number of values for the column. (Please note "null" values are not counted)
Number: Number of distinct values in the column. (Please note "null" values are not counted)
Minimum: Minimum value of the column. (Excluding "null" values)
Maximum: Maximum value of the column. (Excluding "null" values)
First: First value of the grouping.
Last: Last value of the grouping.
Collect list: List of all the values contained in the collection. The result is provided in a list of values of the type ["VALUE 1","VALUE 1","VALUE 2","VALUE 1","VALUE 2","VALUE 1","VALUE 2"," VALUE 1"]
Collect Set: List of distinct values contained in the collection. the result is provided in a list of values of the type ["VALUE 1","VALUE 2"]

Group function panel

The functions present in the list are general functions applied outside any column. At the simple aggregation level, only the Total Count group function is available. The function returns the total number of rows (including null values) for each grouping.
Example of settings for a simple aggregation and visualization of the results.

Example of a simple aggregation

Multidimensional Aggregations

The configuration of the multidimensional aggregation is identical to that of the simple aggregation. See Simple Aggregations

The difference lies only in the result which provides all possible subtotals of each level.

Example of configuration of a multi-dimensional aggregation

Multidimensional Aggregations

gives the following result

aMultidimensional Aggregations result

Column Operations (Column and Multi-Row Operation)

Vertical operations are used to perform operations between rows and according to grouping and sorting criteria.

These operations are varied and can be multiple and simultaneous.

Definition of settings areas

Column operations

Area that contains groups of column operations parameterized for a step.

The user can define as many groups as he wishes. Groups are named automatically from group 1 to group n. The Add Group button allows you to add a group. To access the settings, click on the label of the group.

Zone that contains the columns available to perform the configuration.

information Note that to assign the columns in the different settings areas, you must use the move to the desired areas by drag and drop.

Area that contains the columns used as grouping criteria. The number of columns in this area is not limited. All types (Numeric, Character string and Date) are accepted.

Zone allowing to define the functions which will be applied on the columns and for this grouping. The functions offered may vary depending on the type of column.

For Numeric type columns:

Total Number: Total number of values for the column (excluding null value)
Previous: The function is used to 'retrieve' values from the previous lines of the current line and according to the sort criteria specified in the area.

Attention Note that this function can be associated with a setting at the level of the zone defining the interval of application of function.

Next: The function is used to 'retrieve' values from the rows following the current row according to the sort criteria specified in the area. Note that this function can be associated with a setting at the level of the zone defining the interval of application of function.
Sum: Performs the sum of the value inside a grouping and according to the sorting criteria specified in the area.
Average: Performs the Average of the value inside a grouping and according to the sorting criteria specified in the area.
Minimum: Maximum value of the column inside the grouping (excluding null values).
Maximum: Maximum value of the column inside the grouping (excluding null values).
Median: Median value of the column inside the grouping (excluding null values).
First: First value of the grouping for this column (excluding null values).
Last: Last value of the grouping for this column (excluding values at null).
Collect list: List of all the values contained in the collection. The result is provided in a list of values of the type ["VALUE 1","VALUE 1","VALUE 2","VALUE 1","VALUE 2","VALUE 1","VALUE 2"," VALUER 1"].
Collect Set: List of distinct values contained in the collection. The result is provided in a list of values of the type ["VALUE 1","VALUE 2"].

For Date type columns:

Total Number: Total number of values for the column (excluding null value).
Previous: The function is used to 'retrieve' values from the previous line of the current line and according to the sort criteria specified in the area.

Attention Note that this function can be associated with a setting at the level of the zone defining the interval of application of function.

Next: The function is used to 'retrieve' values on the line following the current line according to the sort criteria specified in the area. .

Attention Note that this function can be associated with a setting at the level of the zone defining the interval of application of function.

Minimum: Minimum date of the column inside the grouping (excluding values at null).
Maximum: Maximum date of the column inside the grouping (excluding values at null).
First: First Date of the grouping for this column (excluding null values) and according to the sorting criteria specified in the area.
Last: Last Date of the grouping for this column (excluding values at null) and according to the sorting criteria specified in the area.
Collect list: List of all the values contained in the collection. The result is provided in a list of values of the type ["VALUE 1","VALUE 1","VALUE 2","VALUE 1","VALUE 2","VALUE 1","VALUE 2"," VALUE 1"].
Collect Set: List of distinct values contained in the collection. The result is provided in a list of values of the type ["VALUE 1","VALUE 2"].

For Character String type columns:

Total number: Total number of value for the column (excluding null value).
Previous: The function is used to 'retrieve' values from the previous line of the current line and according to the sort criteria specified in the area.

Attention Note that this function can be associated with a setting at the level of the zone defining the interval of application of function.

Next: The function is used to 'retrieve' values on the line following the current line according to the sort criteria specified in the area. .

Attention Note that this function can be associated with a setting at the level of the zone defining the interval of application of function.

First: First value of the grouping for this column (excluding null values) and according to the sorting criteria specified in the area.
Last: Last value of the grouping for this column (excluding null values) and according to the sorting criteria specified in the area.
Collect list: List of all the values contained in the collection. The result is provided in a list of values of the type ["VALUE 1","VALUE 1","VALUE 2","VALUE 1","VALUE 2","VALUE 1","VALUE 2"," VALUE 1"].
Collect Set: List of distinct values contained in the collection. The result is provided in a list of values of the type ["VALUE 1","VALUE 2"].

Generic Function Group.

Area that contains generic functions that do not depend on column content. The group functions apply on grouping.

Total number of lines Returns the total number of lines in the group.
CUME_DIST Calculates the cumulative distribution of a value in a group of values. That is, CUME_DIST calculates the relative position of a specified value within a group of values. Assuming ascending order, the CUME_DIST of a value at row r is the number of rows with values less than or equal to the value at row r, divided by the number of rows evaluated in the partition or result set of the request.

CUME_DISTreturns a range of values greater than 0, and less than or equal to 1. Equal values always evaluate to the same cumulative distribution value. CUME_DIST includes NULL values by default and treats them as the lowest possible values.

ROW_NUMBER Assigns a sequence number within the grouping respecting the sort order specified in the zone.
RANK Affects a sequence number but will take into account the notion of duplicate. The Rank function will therefore 'skip' the numbers if there are duplicates assigned to the same rank.
DENSE_RANK Affects a sequence number and will take into account the notion of duplicate. Unlike the RANK function the function. DENSE_RANK will assign the next number in a sequence

Example

Column 1	Column 2	RANK	DENSE-RANK	ROW-NUMBER
HAS	10	1	1	1
HAS	10	1	1	2
HAS	20	1	2	3

PERCENT_RANK Calculates the relative rank of a row within a group of rows. Used to evaluate the relative position of a value in a partition or query result set. The range of values returned by PERCENT_RANK is greater than 0 and less than or equal to 1. The first row in a set has a PERCENT_RANK value of 0. NULL values are included by default and are treated as the lowest possible values.
NTILE Distributes the rows of a sorted partition into a specified number of groups. Groups are numbered from one. For each line, NTILE returns the number of the group to which the line belongs. If the number of rows in a partition is not divisible by an integer, you will get groups of two different sizes that differ by one member. Larger groups come before smaller groups in the order specified by the OVER clause. For example, if the total number of rows is 53 and there are 5 groups, the first three groups will contain 11 rows and the last two will contain 10 rows. On the other hand, if the total number of rows is divisible by the number of groups, the rows will be distributed evenly among the different groups. For example, if the total number of rows is 50 and there are 5 groups, each group will contain 10 rows.

Definition of sort axes Zone that allows you to define the sort criteria that are used within the grouping. For each column, it is possible to specify the direction of the sort Ascending or Descending.

Function calculation configuration zone The configuration of this zone is incompatible with the Previous and Next functions of the .

Special case 1: Date type column at grouping criteria level.

If Date type columns are positioned in the Grouping columns panel, additional settings areas are available.

These zones will make it possible to define grouping intervals as well as a sliding parameter in this interval.

Example of a setting with an interval of 3 days:

aggregation example

This setting generates a grouping on consecutive intervals of 3 days

Aggregation illustration

If we add a sliding period, the aggregation generates more intervals:

example of aggregation with sliding

Attention, the slip period must be less than or equal to the size of the interval

Aggregation slip illustration

Special case 2: Column of Date and Numeric type in functions applied to columns.

At the level of the Previous and Next functions applied to the columns and for the Date and Numeric type columns, two parameters can be assigned.

Offset: This parameter is required. It is used to define the step for retrieving the next or previous value retrieved.

Default value: In case of "Null" value, the default value will be assigned.

Special case 3: forward filling and backward filling operations (data filling)Column operations can meet the needs of data filling operations to automatically fill in unwanted null values.

Table view with Forward and Backward filling

Default value Forward fill: Null values are automatically filled identically to the value of the field preceding the null value Backward filling: Null values are automatically filled with the value of the field following the null value

Setting

From a DataBlock step, open the menu and click on the "Partition/Agréation/Pivot/" icon

Selection of partition/aggregation/pivot type operation

From the "Aggregations and Pivot" window, select "Operation on column" in the left menu then click on the "Add a Group" button.

Forward filling column operation

Forward filling settings

Add the column containing the missing values, option "First value" and "Ignore null values" YES Add the column containing the values that define the reading order Add "From last non-zero value" and "To current line" parameters

Backward filling column operation

Backward filling settings

Add the column containing the missing values, option "First value" and "Ignore null values" YES Add the column containing the values that define the order Add parameters "From current line" "To next non-zero value"

Example

Grouping of Column 1 and Column 2 Sorting on the Date column Function Next on Column Date with Offset: 1

Column 1	Column 2	Date	Next
HAS	10	01/12/2019	13/01/2019
HAS	10	01/13/2019	01/24/2019
HAS	10	01/24/2019	01/27/2019
HAS	10	01/27/2019	null
HAS	11	01/13/2019	20/01/2019
HAS	11	01/20/2019	01/22/2019
HAS	11	01/22/2019	null

Column 1

Column 2

Date

HAS

01/12/2019

13/01/2019

HAS

01/13/2019

01/24/2019

HAS

01/24/2019

01/27/2019

HAS

01/27/2019

null

HAS

01/13/2019

20/01/2019

HAS

01/20/2019

01/22/2019

HAS

01/22/2019

null

Example : Grouping of Column 1 and Column 2 Sort on Column Date Next function applied to the Date column - Offset: 2

Column 1	Column 2	Date	Next
HAS	10	01/12/2019	01/24/2019
HAS	10	01/13/2019	01/27/2019
HAS	10	01/24/2019	null
HAS	10	01/27/2019	null
HAS	11	01/13/2019	01/22/2019
HAS	11	01/20/2019	null
HAS	11	01/22/2019	null

Column 1

Column 2

Date

HAS

01/12/2019

01/24/2019

HAS

01/13/2019

01/27/2019

HAS

01/24/2019

null

HAS

01/27/2019

null

HAS

01/13/2019

01/22/2019

HAS

01/20/2019

null

HAS

01/22/2019

null