Aggregations
General
The aggregation operations are available at the level of each stage of the DataBlocks (module GenericsData) and at the level of the sources of the presentations (module HandleData)
.
An aggregation operation calculates a single value from a collection of values. For example, an aggregation operation can be the calculation of the average daily temperature from the daily temperature values taken over a month.
Principles
An aggregation consists of one (or more) values defining the grouping of the aggregation.
As an option, the aggregation makes it possible to specify one or more calculation functions which will be applied at each level of grouping.
Among the functions are the following:
-
Sum
-
Mean
-
Minimum
-
Maximum
-
Median
-
First value
-
Last value
-
…
Place where aggregation operations are performed
Aggregate operations are available
-
in the GenericsData module at the level of the DataBlock function when setting up the steps (cf DataBlocks Documentation)
-
in the HandleData module at the level of the sources of the presentations
Types of aggregation
4 types of aggregation are available
-
Simple Aggregations
-
Multidimensional Aggregations
-
Column operations
-
Pivot Operations
Simple Aggregations
The configuration screen for a simple aggregation is divided into 4 zones.
Choice of aggregation type (here, Simple)
Panel containing the list of available columns for aggregation settings
If the number of columns is large, and in order to refine the search, an
input box allows to filter the available columns.
Panel containing the columns defining the grouping criteria
-
The number of columns is not limited.
-
Column types (AlphaNumeric, Numeric and Date) can be mixed.
Panel containing the functions to apply to the grouping
-
Note that it is possible not to specify a function in the
panel. In this case, only the grouping on the criteria specified in the
panel will be executed without any function calculation.
-
The number of function is not limited.
-
The columns embedded in the functions cannot be the same as the columns used in the panel defining the grouping criteria
.
The proposed functions are dependent on the type of column on which the function is applied (AlphaNumeric, Numeric and Date)
Some examples
-
Total Number: Total number of values for the column. (Please note "null" values are not counted)
-
Number: Number of distinct values in the column. (Please note "null" values are not counted)
-
Minimum: Minimum value of the column. (Excluding "null" values)
-
Maximum: Maximum value of the column. (Excluding "null" values)
-
First: First value of the grouping.
-
Last: Last value of the grouping.
-
Collect list: List of all the values contained in the collection. The result is provided in a list of values of the type ["VALUE 1","VALUE 1","VALUE 2","VALUE 1","VALUE 2","VALUE 1","VALUE 2"," VALUE 1"]
-
Collect Set: List of distinct values contained in the collection. the result is provided in a list of values of the type ["VALUE 1","VALUE 2"]
Group function panel
-
The functions present in the list are general functions applied outside any column. At the simple aggregation level, only the Total Count group function is available. The function returns the total number of rows (including null values) for each grouping.
-
Example of settings for a simple aggregation and visualization of the results.
Multidimensional Aggregations
The configuration of the multidimensional aggregation is identical to that of the simple aggregation. See Simple Aggregations
The difference lies only in the result which provides all possible subtotals of each level.
Example of configuration of a multi-dimensional aggregation
gives the following result
Column Operations (Column and Multi-Row Operation)
Vertical operations are used to perform operations between rows and according to grouping and sorting criteria.
These operations are varied and can be multiple and simultaneous.
Definition of settings areas
Area that contains groups of column operations parameterized for a step.
The user can define as many groups as he wishes. Groups are named automatically from group 1 to group n. The Add Group button allows you to add a group. To access the settings, click on the label of the group.
Zone that contains the columns available to perform the configuration.
Note that to assign the columns in the different settings areas, you must use the move to the desired areas by drag and drop.
Area that contains the columns used as grouping criteria.
The number of columns in this area is not limited.
All types (Numeric, Character string and Date) are accepted.
Zone allowing to define the functions which will be applied on the columns and for this grouping.
The functions offered may vary depending on the type of column.
For Numeric type columns:
-
Total Number: Total number of values for the column (excluding null value)
-
Previous: The function is used to 'retrieve' values from the previous lines of the current line and according to the sort criteria specified in the
area.
Note that this function can be associated with a setting at the level of the
zone defining the interval of application of function.
-
Next: The function is used to 'retrieve' values from the rows following the current row according to the sort criteria specified in the area.
Note that this function can be associated with a setting at the level of the
zone defining the interval of application of function.
-
Sum: Performs the sum of the value inside a grouping and according to the sorting criteria specified in the
area.
-
Average: Performs the Average of the value inside a grouping and according to the sorting criteria specified in the
area.
-
Minimum: Maximum value of the column inside the grouping (excluding null values).
-
Maximum: Maximum value of the column inside the grouping (excluding null values).
-
Median: Median value of the column inside the grouping (excluding null values).
-
First: First value of the grouping for this column (excluding null values).
-
Last: Last value of the grouping for this column (excluding values at null).
-
Collect list: List of all the values contained in the collection. The result is provided in a list of values of the type ["VALUE 1","VALUE 1","VALUE 2","VALUE 1","VALUE 2","VALUE 1","VALUE 2"," VALUER 1"].
-
Collect Set: List of distinct values contained in the collection. The result is provided in a list of values of the type ["VALUE 1","VALUE 2"].
For Date type columns:
-
Total Number: Total number of values for the column (excluding null value).
-
Previous: The function is used to 'retrieve' values from the previous line of the current line and according to the sort criteria specified in the
area.
Note that this function can be associated with a setting at the level of the
zone defining the interval of application of function.
-
Next: The function is used to 'retrieve' values on the line following the current line according to the sort criteria specified in the area.
.
Note that this function can be associated with a setting at the level of the
zone defining the interval of application of function.
-
Minimum: Minimum date of the column inside the grouping (excluding values at null).
-
Maximum: Maximum date of the column inside the grouping (excluding values at null).
-
First: First Date of the grouping for this column (excluding null values) and according to the sorting criteria specified in the
area.
-
Last: Last Date of the grouping for this column (excluding values at null) and according to the sorting criteria specified in the
area.
-
Collect list: List of all the values contained in the collection. The result is provided in a list of values of the type ["VALUE 1","VALUE 1","VALUE 2","VALUE 1","VALUE 2","VALUE 1","VALUE 2"," VALUE 1"].
-
Collect Set: List of distinct values contained in the collection. The result is provided in a list of values of the type ["VALUE 1","VALUE 2"].
For Character String type columns:
-
Total number: Total number of value for the column (excluding null value).
-
Previous: The function is used to 'retrieve' values from the previous line of the current line and according to the sort criteria specified in the
area.
Note that this function can be associated with a setting at the level of the
zone defining the interval of application of function.
-
Next: The function is used to 'retrieve' values on the line following the current line according to the sort criteria specified in the area.
.
Note that this function can be associated with a setting at the level of the
zone defining the interval of application of function.
-
First: First value of the grouping for this column (excluding null values) and according to the sorting criteria specified in the
area.
-
Last: Last value of the grouping for this column (excluding null values) and according to the sorting criteria specified in the
area.
-
Collect list: List of all the values contained in the collection. The result is provided in a list of values of the type ["VALUE 1","VALUE 1","VALUE 2","VALUE 1","VALUE 2","VALUE 1","VALUE 2"," VALUE 1"].
-
Collect Set: List of distinct values contained in the collection. The result is provided in a list of values of the type ["VALUE 1","VALUE 2"].
Generic Function Group.
Area that contains generic functions that do not depend on column content.
The group functions apply on grouping.
-
Total number of lines Returns the total number of lines in the group.
-
CUME_DIST Calculates the cumulative distribution of a value in a group of values. That is, CUME_DIST calculates the relative position of a specified value within a group of values. Assuming ascending order, the CUME_DIST of a value at row r is the number of rows with values less than or equal to the value at row r, divided by the number of rows evaluated in the partition or result set of the request.
CUME_DISTreturns a range of values greater than 0, and less than or equal to 1. Equal values always evaluate to the same cumulative distribution value. CUME_DIST includes NULL values by default and treats them as the lowest possible values.
-
ROW_NUMBER Assigns a sequence number within the grouping respecting the sort order specified in the zone.
-
RANK Affects a sequence number but will take into account the notion of duplicate. The Rank function will therefore 'skip' the numbers if there are duplicates assigned to the same rank.
-
DENSE_RANK Affects a sequence number and will take into account the notion of duplicate. Unlike the RANK function the function. DENSE_RANK will assign the next number in a sequence
Example
Column 1 | Column 2 | RANK | DENSE-RANK | ROW-NUMBER |
---|---|---|---|---|
HAS |
10 |
1 |
1 |
1 |
HAS |
10 |
1 |
1 |
2 |
HAS |
20 |
1 |
2 |
3 |
-
PERCENT_RANK Calculates the relative rank of a row within a group of rows. Used to evaluate the relative position of a value in a partition or query result set. The range of values returned by PERCENT_RANK is greater than 0 and less than or equal to 1. The first row in a set has a PERCENT_RANK value of 0. NULL values are included by default and are treated as the lowest possible values.
-
NTILE Distributes the rows of a sorted partition into a specified number of groups. Groups are numbered from one. For each line, NTILE returns the number of the group to which the line belongs. If the number of rows in a partition is not divisible by an integer, you will get groups of two different sizes that differ by one member. Larger groups come before smaller groups in the order specified by the OVER clause. For example, if the total number of rows is 53 and there are 5 groups, the first three groups will contain 11 rows and the last two will contain 10 rows. On the other hand, if the total number of rows is divisible by the number of groups, the rows will be distributed evenly among the different groups. For example, if the total number of rows is 50 and there are 5 groups, each group will contain 10 rows.
Definition of sort axes Zone that allows you to define the sort criteria that are used within the grouping.
For each column, it is possible to specify the direction of the sort Ascending or Descending.
Function calculation configuration zone The configuration of this zone is incompatible with the Previous and Next functions of the
.
Special case 1: Date type column at grouping criteria level.
If Date type columns are positioned in the Grouping columns panel, additional settings areas are available.
These zones will make it possible to define grouping intervals as well as a sliding parameter in this interval.
Example of a setting with an interval of 3 days:
This setting generates a grouping on consecutive intervals of 3 days
If we add a sliding period, the aggregation generates more intervals:
Attention, the slip period must be less than or equal to the size of the interval
Special case 2: Column of Date and Numeric type in functions applied to columns.
At the level of the Previous and Next functions applied to the columns and for the Date and Numeric type columns, two parameters can be assigned.
Offset: This parameter is required.
It is used to define the step for retrieving the next or previous value retrieved.
Default value: In case of "Null" value, the default value will be assigned.
Special case 3: forward filling and backward filling operations (data filling)Column operations can meet the needs of data filling operations to automatically fill in unwanted null values.
Default value
Forward fill: Null values are automatically filled identically to the value of the field preceding the null value
Backward filling: Null values are automatically filled with the value of the field following the null value
Setting
From a DataBlock step, open the menu and click on the "Partition/Agréation/Pivot/" icon
From the "Aggregations and Pivot" window, select "Operation on column" in the left menu then click on the "Add a Group" button.
Forward filling column operation
Add the column containing the missing values, option "First value" and "Ignore null values" YES
Add the column containing the values that define the reading order
Add "From last non-zero value" and "To current line" parameters
Backward filling column operation
Add the column containing the missing values, option "First value" and "Ignore null values" YES
Add the column containing the values that define the order
Add parameters "From current line" "To next non-zero value"
Example
Grouping of Column 1 and Column 2 Sorting on the Date column Function Next on Column Date with Offset: 1
Column 1 | Column 2 | Date | Next |
---|---|---|---|
HAS |
10 |
01/12/2019 |
13/01/2019 |
HAS |
10 |
01/13/2019 |
01/24/2019 |
HAS |
10 |
01/24/2019 |
01/27/2019 |
HAS |
10 |
01/27/2019 |
null |
HAS |
11 |
01/13/2019 |
20/01/2019 |
HAS |
11 |
01/20/2019 |
01/22/2019 |
HAS |
11 |
01/22/2019 |
null |
Example : Grouping of Column 1 and Column 2 Sort on Column Date Next function applied to the Date column - Offset: 2
Column 1 | Column 2 | Date | Next |
---|---|---|---|
HAS |
10 |
01/12/2019 |
01/24/2019 |
HAS |
10 |
01/13/2019 |
01/27/2019 |
HAS |
10 |
01/24/2019 |
null |
HAS |
10 |
01/27/2019 |
null |
HAS |
11 |
01/13/2019 |
01/22/2019 |
HAS |
11 |
01/20/2019 |
null |
HAS |
11 |
01/22/2019 |
null |