DataBlocks

Icon DataBlock

General - DataBlock

DataBlock represents Level 4 of the DataChain value chain. cf General Principles

Chain value

The DataBlock function is a strategic function of the DataChain offer.

The DataBlock module is the DataFabric function of the DataChain offer which allows operations to be carried out on data through a complete process. Lineage

DataBlocks are blocks of data, in other words, an assortment of characteristics from one or more entities. They thus constitute the headquarters for the exploitation of cross-referenced data. A DataBlock is initialized with input data.

Inside the DataBlock, it is possible to generate several successive steps. Each step has sub-steps to perform multiple operations. Columns made during a step are made available to subsequent steps. The number of steps in a DataBlock is not limited.

pipeline

The operations available at each step are as follows:

Link DataBlock / Business entity
Union between DataBlock / Business entity
Formula Builder
Operations (Explode, Stack, Unpivot…)
Filters
Aggregations/Pivot/Partitions
Sorts
Organization of step output columns

Access to DataBlock Access to the list of DataBlocks Creating a new DataBlocks

To create a DataBlock, click on

Presentation of the general contents of the screen DataBlock

InitDatablock

DataBlock initialization block: clicking on Table icon opens a dialog box to choose an existing DataBlock or a Business entity

Steps - Operations tab: step management area (DataFabrique) and data display

Linked Entities and DataBlocks tab: presents all the DataBlocks and Business entities that feed the current DataBlock (for initialization or the links defined within the different stages)

Persistence Settings tab: specify persistence settings Note that the parameters needed to achieve persistence are taken into account only after the action save button

Output Data Tab: Provides the result of the DataBlock (i.e. structure and data) as if consumed externally . Indeed, there may be a discrepancy between the results displayed in the tab and those displayed in the tab, 40].

Example: the user generates two steps in the "Steps - Operations" tab then saves. The result obtained in the "Output data" tab will be the same as in the "Steps - Operations" tab. On the other hand, if the user adds a third step in the "Steps - Operations" tab but does not save, there will be a difference in result. Step 3 will only be taken into account in the result of the "Steps - Operations" tab (step 3 not yet saved).

Linked HandleData elements tab: presents in the form of a table, all the elements of the HandleData module which consume the current DataBlock

Icon : allows you to add data processing steps. A step can be added either at the end of all the steps or between two steps.

Icon allowing to have the summary of all the steps. The information presented is entered byr the user at each step via the icon

Click on Play icon to run the steps and view the data

Check to make search case insensitive

Range to search for values

General actions on the DataBlock.

Delete: Allows you to delete the DataBlock.

Note that an impact study is carried out to validate the possibility of removing the DataBlock. If the deletion can cause a malfunction on the value chain, this deletion will be blocked.
Save: Allows you to save all the parameters of the DataBlock.
Back: Returns to the list of DataBlocks.

Persistence icons of a DataBlock. Data persistence is available at the level of Business entities and DataBlocks. This option is used to freeze the output data of a DataBlock at a given time. Persistence has a double objective: to freeze the data and to greatly increase their performance when it is made available.

This persistence can be achieved in 3 different ways:

Disk Persistence
Memory Persistence
Disk and Memory Persistence

Warning Depending on the volume of data, the persistence of a DataBlock may have a variable duration.

Creating a DataBlock

There are several methods to create a DataBlock

Direct creation
Creation from an Business entity
Creation by duplicating an existing DataBlock

Direct creation of a DataBlock

Access the GenericsData module
Left menu of GenericsData to access the DataBlock function
Click on the icon
The list of existing DataBlocks is displayed
Click on the button

Like all DataChain elements, it is required to specify the metadata that will be linked to the DataBlock before saving it

ViewMetadata

Entering a label is Required.

Optional input fields allow you to refine the information related to a DataBlock (a description, information on the user license and Tags).

Once this information has been entered, save the metadata using the button located at the top left of the screen.

Warning A DataBlock must compulsorily have input data. When creating in direct mode, an additional action must be performed to link an existing DataBlock or an Business entity to the new DataBlock.

Init DataBlock Columns

Click on the icon

The dialog box for choosing an Business entity or a DataBlock is displayed on the screen.

DataBLock initialization

Allows to choose the initialization mode of the DataBlock either

From an existing Business entity
From an existing DataBlock

Input box to refine the search, here for example with a search on the Tag "DOC"

The Filter drop-down list allows you to choose which elements are selectable in the Characteristics and entities panel (Only characteristics, only entities or both).

The characteristics are recognizable by the mention of their original Business entity (in parentheses). To add a column to the DataBlock, click on the icon corresponding to it.

The Business entities are presented in the form of labels without brackets. To integrate one or more of its characteristics into the DataBlock, click on the icon corresponding to it. The following dialog box opens:

Popup choice of columns of a DataBlock

It allows to select the columns of the Business entity or the DataBlock to add to the new DataBlock. Once the validation is done, save the DataBlock.

DataBlock initialization is complete. It is then possible to display the values.

Creation from an Business entity

Creating a DataBlock from an Business entity is fast. This creation mode is used to automatically initialize the DataBlock with the columns of the source Business entity.

Access the GenericsData module
Left menu of GenericsData to access the function Business entity
Click on the icon
The list of existing Business entities is displayed
Find the desired Business entity

information The list function offers filter and sort functions by column and a global search on all the columns.

Once the Business entity is found in the table, click on the icon in the Actions column
Modify the label in the metadata panel if necessary
Save new DataBlock

Creation by duplicating an existing DataBlock

Creating a DataBlock from a duplicate of a DataBlock is fast. This creation mode is used to automatically initialize the DataBlock with the columns AND the steps defined in the source DataBlock.

Access the GenericsData module
Left menu of GenericsData to access the DataBlock function
Click on the icon
The list of existing DataBlocks is displayed
Search for the desired DataBlock

information The list function offers filter and sort functions by column and a global search on all columns

Once the DataBlock found in the table, click on the icon in the Actions column
Modify the label in the metadata panel if necessary

Data processing: DataFabrique

Data Visualization

Data visualization is done either

from the Steps - Operations tab
from the Output Data tab

information Note that from each step, it is possible to visualize the data. In this case, all the steps located upstream will be taken into account.

Execution of processes

Executes processing from step 1 (initialization)
Executes processing from step 1 + step 2
Executes the treatments of step 1, step 2 and step 3
Executes the processing of all stages: stage 1, stage 2 and stage 3

Data processing: The steps

The data processing of a DataBlock is performed in the Steps - Operations tab by creating processing sets named "Steps".

A step can consist of several types of operations on the columns. The number of steps that can be generated is not limited.

The steps of a DataBlock are created and executed consecutively. Thus each runs from the result returned by the previous one.

Creating a step

Adding a step

From the Steps - Operations tab of the DataBlock
Click on icon

A step can be added either
- end of line
- between two stages

In this case, position yourself on a step and use the icon. The new step will be added to the right of the selected step.

Changing the label of a step

stepinfo

Each step has a label and a description. To access this function, click on the icon located in

Adding operations in a step

Each step integrates a suite of functions allowing to carry out operations on the columns. The available operations are:
Link DataBlock / Business entity
Union between DataBlock / Business entity
Formula Builder
Operations (Explode, Stack,..)
Filters
Aggregations/Pivot/Partitions
Sorts
Organization of step output columns

To access the list of available functions, click on the step level on the icon.

Link to another DataBlock or another Business entity.

The "DataBlock link" processing of a step is used to add to the current DataBlock the values of the columns of another DataBlock or of another Business entity.

To do this, a link will be created between the two DataBlocks or between a DataBlock and an Business entity. This link will only exist within the DataBlock where it was created.

Click on the icon
Choose option Links DataBlocks
The icon then appears at the step level.

Add step

Click on the icon, in the step, or in the menu to access the join configuration window

Join settings

Click on the Add a link button.

Enter the label for the link being created.

Click on the icon to open the dialog box for selecting the DataBlock with which to establish the link

Click on the DataBlock or the Business entity to choose The search field located at the head of the column can help to find it using the keywords contained in its label.

Save to close window

Choose the characteristics whose values will be added to the data table of the current DataBlock following the creation of the link

All that remains is to define the link to be established between the data blocks using one or more joins between their respective columns. The columns of a join must be of the same type (Character string, Numeric, Date) to be linked.

Choose link specification (Left, Inner, Right)

Choose the characteristic of the current DataBlock (source) to link

Choose the type association criterion (operator) of the characteristics

Choose the characteristic of the other data block constituting the link (target)

It is sometimes necessary to define several link criteria. The New criterion button allows you to add as many criteria as necessary. The multiple joins of a link are cumulative (associated by logical "AND"). Thus, the data resulting from the link will respect each of the conditions expressed by the joins.

Validate the previous actions by clicking on the Save button.

The process is ready to be executed and its result is displayed.

Note that once the link has been created, an icon allowing analysis of the performance of the link is available to the right of the label of the link Statistical details of the join

Unions and intersections

The union consists of adding the values of a third-party DataBlock or of an Business entity to the columns of the data table of the current DataBlock to end up with a table containing * all the lines* of each of the two data blockss.

The intersection consists in keeping only the rows present at the same time in the current DataBlock and the third data block.

Click on the icon
Choose option Unions / Intersections
The icon is then positioned at the step level.

Add union

Click on the icon in the menu or in the step

Add union to DataBlock

Click on the icon

Enter the label that will carry the union or intersection operation being created.

Click on the icon to open the dialog box for selecting the data block ( DataBlock or Entity Type) to link to

Click on the DataBlock or the Business entity to choose The search field located at the head of the column can help to find it using the keywords contained in its label.

Save to close window

Choose the operation to perform (Union or Intersection).

Choose the columns of the third (or target) data block (DataBlock or Entity Type) to associate with those of the current DataBlock using the lists drop-down boxes made available.

Validate the previous actions by clicking on the save button.

Attention, each column of the current DataBlock must be associated with a column of the target data block (DataBlock or Entity Type) to obtain results in case of intersection. Indeed, otherwise, the intersection between the DataBlocks will return an empty array.

The process is ready to be executed and its result displayed.

Formulas

A step can include processing consisting in creating new columns in the data table of a DataBlock from constants and the values of its other columns. This is the Formulas functionality.

The number of Formulas that can be created per step is not limited.

Click on the icon

Choose option Formulas

The DataBlock <a href= Formulas">,29,29 icon is then positioned at the step level.

imageFormule

Click on the icon in the menu or the step to open the dialog box for adding a formula to the current step:

Enter Formula

The New formula Button allows to start the creation.

Sets the formula label.

information Note that the label of the formula will also be used for the label of the new generated column.

Area for entering and building the formula.

A formula can be Simple or Complex (cascading formula).

The structure of a formula is as follows:

A name
Attributes defining the inputs allowing the operation to be carried out.

An attribute maybe A column available in the step A manual entry A choice in a closed list Another formula

Example: Extraction of the first two characters of the postal code to create a new column department.

Step 1: Click on the "New Formula" button
Step 2: Label: Department
Step 3: Click in the formula input area

Formula search

Find the formula str.extract (either using the list scroll bar or writing the letters "ext"

Choose the formula str.extract

Click on the formula to select it (closes the dialog box) Note that the user can display help on a formula by hovering over it The formula appearsthen print in the formula area:

Creation of a formula

The formula str.extract requires 3 input arguments (quotes).

The value to process The direction of the extraction The number of characters to extract

To specify each of the attributes, click on the quotation mark in question to open a dialog box. To modify the choice made, click again on the argument to open the dialog box again.

Adding formula arguments

information Help on the formula is visible on the right: it specifies the current argument (blue background) and the type of argument expected (in brackets)

Argument 0: The value to process

A String is expected as the value to be processed, in the example, select the "Common" column

Argument 1: The meaning of the extraction

A String is expected as the value to be processed, in the example you must select "Left"

Argument 2: The number of characters to extract

A String is expected as the value to be processed, in the example it is a manual entry: write the number 2 in the input area

The formula is complete.

Validate formula

Click on to validate it.

information To quickly find the formula, it is possible to save it to the catalog by clicking on the icon .

The formula has been added to the current step and is active.

Formula actions

Enables or disables the formula Click to edit formula Click to duplicate the formula Click to remove formula To close the formula manager and return to the DataBlock, click on return

The processing is ready to be executed and its result, displayed

information Detailed documentation on formulas is available in the "Formulas" dialog box. Cf Documentation list of Formulas

Filters

The filter is an operation consisting of retaining from the data table of a DataBlock only the rows whose values respect certain conditions.

Click on the icon
Choose the option Filters
The icon is then positioned at the level of the current step

Adding filter to step

Click on the icon in the menu or in the step to open the dialog box for adding a filter to the step current :

Filter Settings

List of available columns that can be integrated as Filter criteria

Click on the arrow to include a column as a filter criterion

List of operators available for a criterion. Note that the list of operators is different depending on the type of column (Character string, Numeric, Date).

Action area on criteria and groups of criteria. The icons allow

to delete a criterion
to add a group of criteria
to delete a group of criteria

Note that criteria can be moved from group to group by Drag&Drop

Enables or disables the filter for this step

information In this case where it is inactivated, the filter is no longer taken into account during the execution of the step but it remains available. If a filter is inactivated, the step’s filter icon is red

Saves the settings made

Details of placing a filter on multiple columns

Configuration of filters on multiple columns

Click on the arrow to add a column as criteria

If the criteria are positioned in groups, the groups can be linked together by an AND or an OR. A click on the area allows you to change the value (OR to AND) and (AND to OR)

Areas used to enter the value of the criterion. The criteria must be consistent and in phase with the type of the queried column. Depending on the operator, two criteria can be provided

List of operators available for a criterion.

information Note that the list of operators is different depending on the column type (String, Numeric, Date).

Adds a new criteria group to the first group

Adds a new criteria group to the second group

Remove Criteria Group

Remove criterion

Opens a dialog box to search for values in the columns

Opens a dialog to select a column as value for the criterion

To add a criterion, select it with the blue arrow then drag and drop into the desired group with a long click on

Once the filters have been edited, click on the button to save them.

The process is ready to be executed and its result is displayed.

Its result will only be visible in the data table after loading the step or the entire DataBlock (and therefore all its steps successively).

Operations

The process called "Operations" allows you to execute three additional functions on the data: Stack, Explode and Redimension

The Stack function consists in stacking on several lines values previously gathered on a single line. A new column is then created to display these values. This function is configured by indicating the column headers of the values to be stacked as well as the number of rows on which to display them.
The Explode function allows you to spread over several lines the elements of list type values.
The Redimension function allows you to perform an inverted pivot
- Resize by column group
- Resize by column

Steps

Click on the icon

Choose option Operations

The icon is then positioned at the level of the current step.

imageOpe

Click on the icon in the menu or in the step

To open the dialog for adding a filter to the current step:

Choose from the list of possible operations Operation.

Select the icon to open the dialog box for managing Stack and Explode and Redimension processing.

NOTE: A detailed documentation on operations is available in the Operations page

Partition / Aggregation

The Partition / Aggregation processing aims to reorganize the data by joining them into groups. It also ensures the execution of statistical functions (average, effective, maximum, minimum) and descriptive (list of group values) on these groups.

This treatment is available in 3 versions:

Aggregation Simple
The multidimensional aggregation
Operation on columns or vertical formulas

To access aggregate functions

Click on the icon
Choose the option Partitions / Agrega/ Pivots
The icon is then positioned at the step level.

imageAgre

Click on the icon

information A detailed documentation on aggregations is available in the Aggregations popup

Tris

The Tris process is used to manage the display order of the data in the table of values of the DataBlock.

Click on the icon
Choose the option Tris
The icon is then positioned at the step level.

Tris on Datablock

Click on the icon to open the following dialog box for sorting the values of the current data table.

Creating a sort

Use the drag handle accompanying the items of the Attributes panel (columns of the data table) to drag and drop them into the Sorted columns panel in view to define a sort order.

Set Ascendant (smallest to largest) or Descendant (opposite) sorting of column values in the Columns sorted panel to using their drop-down list.

Once the sort parameters have been defined as desired, click on the button to validate and save these modifications.

The configured aggregations will only be visible in the data table after loading the step or the entire DataBlock (and therefore all its steps successively).

Step output

The Step exit function is used to manage the presentation of the columns at the step exit. During this operation, the order of the columns can be changed, the label of the columns can be modified and columns can be inactivated (or activated) to manage their visibility.

A column inactivated in a step is no longer available in the following step

Click on the icon
The icon is always present in the panel of each step.

step output

List of columns available for output. It is possible to change the order of the columns. see

Allows to give an Alias to the column

Option that allows you to change the column type. Note that for certain conversions such as dates and decimals, a reading format must be specified

For date types, allows you to specify a date display format. Note that a verification of the integrity of the value chain is carried out.

Handling of null values. If this option is activated, it is then possible to enter a value replacing null values

Allows you to specify whether the column concerned will be available or not in the next step.

information Note that a verification of the integrity of the value chain is performed.

Changed the order of the columns. Position yourself on a line then Drag & Drop at the desired level.

One column search area.

Accesses a screen presenting the origin of the current column.

Saves the changes made.

Delete step

To delete a step, click on the button that accompanies it.

Attention: after clicking on the delete button of a step, there is no message

information Re-mapping feature is available at boot level

Column statistics

Statistics on a column

Access statistics window

Click on to access the column statistics popup

stat for a column

Refresh column statistics Click to show count of distinct values in column - May take a while to show Click to display the number of approximate distinct values in the column - Faster display on previous Click to display statistics on the most frequent values of the column

stat for a column

Statistics on all columns

Access setting global statistics

It is possible to see the statistics for each step: click on to display the menu for choosing the statistics to be to display

Display global stats of a step

Click to show menu Choose statistics to display Click on to execute the step Representation of the amount of "Null" values in the column. On hover, displays the precise statistics in number and percentage. Hover over to display column statistics

Re-mapping - Changing the element that feeds a Datablock (or a HandleData source)

The initialization step of the data sources of the HandleData presentations
The DataBlocks initialization step of the GenericsData module

The purpose of this functionality is to be able to modify the element (Business Entities or DataBlock) that feeds a DataBlock or a HandleData presentation source.

To perform this operation, two steps are required

A – At the initialization stage, use the Mapping button to access the screen allowing you to choose a new source.

B – Once the new source has been chosen, it is necessary to carry out the mapping between the columns of the new source and the columns already available in the current data block.

An automatic recognition makes it possible to propose a default mapping. This recognition is done on the correspondence of the labels and the type of the columns.

information Note that to be mapped, the linked columns must be of the same type.

Example of changing the initialization source of a DataBlock

Datablock Initialization Access

The DataBlock-1 contains 3 steps, the first of which is the initialization step (gray area at the beginning of the Pipeline) Access the DataBlock initialization screen by clicking on

Datablock initialization screen presentation

Indicates the columns of the original element that feed the DataBlock Indicates the Entity used to feed the DataBlock Indicates the additional characteristic added to the Repository linked to the initial Entity created with it To change the source that feeds Datablock 1 to another data source, use the Mapping button located at the top right of the initialization screen . The remap popup opens:

remaping DataBlock

Click to search for a new source to map. This new source may be:

either a business entity
or another DataBlock

Click on the line of the new source Click on Save to validate and close the window

Once the new source has been chosen (here My Source 2), the columns bearing the same label AND which are of the same type will be mapped automatically. Other columns can be mapped manually if needed.

information Notethat it is possible by using the + button to add a new column to the current DataBlock allowing to create a new value fed by a column from the new source

Matches are made between columns.
Validate the change with the button located at the top right of the screen.
Re-mapping is complete.

The Datablock is then fed by the new source.

Editing a DataBlock

From the list of DataBlocks from the GenericsData module
Click on icon DataBlock
Search for the target DataBlock
At the level of the targeted DataBlock line, click on the icon or on the label of the DataBlock

Temporary state of a DataBlock during a work session

When editing a DataBlock, the current modifications are now saved in a temporary space.

If the user quits editing the DataBlock, a temporary save of the state is made.

In edition of the Datablock, the temporary state is edited and the user finds the state of the temporary Datablock.

Attention, the temporary recording is not a permanent recording. Only the action Save performs a definitive recording.

A Refresh button allows you to return to the state of the last definitive recording made

either by the current user
either by another user

information The presence of the Refresh button indicates to the user that the current edition is an edition of a temporary state.

Clicking on the Refresh button refreshes the Datablock during the last definitive save. A message indicates the date, time and login of the user who made the last recording

Removing a DataBlock

An impact check is performed when deleting a DataBlock. Depending on the impact on the value chain, DataChain may block deletion.

Deletion of a DataBlock is done

Either from the list of DataBlocks
- At the level of the targeted DataBlock line, click on the icon to perform the deletion.
Either in edition of a DataBlock
- Use the button located on the top left banner.