DataBlocks

Icon DataBlock

General - DataBlock

DataBlock represents Level 4 of the DataChain value chain. cf General Principles

Chain value

The DataBlock function is a strategic function of the DataChain offer.

The DataBlock module is the DataFabric function of the DataChain offer which allows operations to be carried out on data through a complete process. Lineage

DataBlocks are blocks of data, in other words, an assortment of characteristics from one or more entities. They thus constitute the headquarters for the exploitation of cross-referenced data. A DataBlock is initialized with input data.

Inside the DataBlock, it is possible to generate several successive steps. Each step has sub-steps to perform multiple operations. Columns made during a step are made available to subsequent steps. The number of steps in a DataBlock is not limited.

pipeline

The operations available at each step are as follows:

Access to DataBlock 1 Access to the list of DataBlocks 2 Creating a new DataBlocks

To create a DataBlock, click on New

Presentation of the general contents of the screen DataBlock

InitDatablock

1 DataBlock initialization block: clicking on Table icon opens a dialog box to choose an existing DataBlock or a Business entity

2 Steps - Operations tab: step management area (DataFabrique) and data display

3 Linked Entities and DataBlocks tab: presents all the DataBlocks and Business entities that feed the current DataBlock (for initialization or the links defined within the different stages)

4 Persistence Settings tab: specify persistence settings Warning Note that the parameters needed to achieve persistence 11 are taken into account only after the action save button Save

5 Output Data Tab: Provides the result of the DataBlock (i.e. structure and data) as if consumed externally . Indeed, there may be a discrepancy between the results displayed in the 2 tab and those displayed in the 5 tab, 40].

Example: the user generates two steps in the "Steps - Operations" tab then saves. The result obtained in the "Output data" tab will be the same as in the "Steps - Operations" tab. On the other hand, if the user adds a third step in the "Steps - Operations" tab but does not save, there will be a difference in result. Step 3 will only be taken into account in the result of the "Steps - Operations" tab (step 3 not yet saved).

6 Linked HandleData elements tab: presents in the form of a table, all the elements of the HandleData module handleData -color.png which consume the current DataBlock

7 Icon +: allows you to add data processing steps. A step can be added either at the end of all the steps or between two steps.

8 Icon allowing to have the summary of all the steps. The information presented is entered byr the user at each step via the icon tag.svg

9 Click on Play icon to run the steps and view the data

10 Check to make search case insensitive

11 Range to search for values

12 General actions on the DataBlock.
  • Delete: Allows you to delete the DataBlock.

    Warning Note that an impact study is carried out to validate the possibility of removing the DataBlock. If the deletion can cause a malfunction on the value chain, this deletion will be blocked.

  • Save: Allows you to save all the parameters of the DataBlock.

  • Back: Returns to the list of DataBlocks.

13 Persistence icons of a DataBlock. Data persistence is available at the level of Business entities and DataBlocks. This option is used to freeze the output data of a DataBlock at a given time. Persistence has a double objective: to freeze the data and to greatly increase their performance when it is made available.

This persistence can be achieved in 3 different ways:

  • Disk Persistence

  • Memory Persistence

  • Disk and Memory Persistence

Warning Depending on the volume of data, the persistence of a DataBlock may have a variable duration.

Creating a DataBlock

There are several methods to create a DataBlock

  • Direct creation

  • Creation from an Business entity

  • Creation by duplicating an existing DataBlock

Direct creation of a DataBlock

  • Access the GenericsData module genericsData-color

  • Left menu of GenericsData to access the DataBlock function

  • Click on the icon Icon DataBlock

  • The list of existing DataBlocks is displayed

  • Click on the New button

Like all DataChain elements, it is required to specify the metadata that will be linked to the DataBlock before saving it

ViewMetadata

Entering a label is Required.

Optional input fields allow you to refine the information related to a DataBlock (a description, information on the user license and Tags).

Once this information has been entered, save the metadata using the Save button located at the top left of the screen.

Warning A DataBlock must compulsorily have input data. When creating in direct mode, an additional action must be performed to link an existing DataBlock or an Business entity to the new DataBlock.

Init DataBlock Columns

1 Click on the icon

The dialog box for choosing an Business entity or a DataBlock is displayed on the screen.

DataBLock initialization

1 Allows to choose the initialization mode of the DataBlock either

2 Input box to refine the search, here for example with a search on the Tag "DOC"

3 The Filter drop-down list allows you to choose which elements are selectable in the Characteristics and entities panel (Only characteristics, only entities or both).

4 The characteristics are recognizable by the mention of their original Business entity (in parentheses). To add a column to the DataBlock, click on the icon Entity Icon corresponding to it.

n5 The Business entities are presented in the form of labels without brackets. To integrate one or more of its characteristics into the DataBlock, click on the icon IconeEntity corresponding to it. The following dialog box opens:

Popup choice of columns of a DataBlock

It allows to select the columns of the Business entity or the DataBlock to add to the new DataBlock. Once the validation is done, save the DataBlock.

DataBlock initialization is complete. It is then possible to display the values.

Creation from an Business entity

Creating a DataBlock from an Business entity is fast. This creation mode is used to automatically initialize the DataBlock with the columns of the source Business entity.

information The list function offers filter and sort functions by column and a global search on all the columns.

  • Once the Business entity is found in the table, click on the DataBlock icon in the Actions column

  • Modify the label in the metadata panel if necessary

  • Save new DataBlock

Creation by duplicating an existing DataBlock

Creating a DataBlock from a duplicate of a DataBlock is fast. This creation mode is used to automatically initialize the DataBlock with the columns AND the steps defined in the source DataBlock.

  • Access the GenericsData module genericsData-color

  • Left menu of GenericsData to access the DataBlock function

  • Click on the icon DataBlock

  • The list of existing DataBlocks is displayed

  • Search for the desired DataBlock

information The list function offers filter and sort functions by column and a global search on all columns

  • Once the DataBlock found in the table, click on the Duplicate icon in the Actions column

  • Modify the label in the metadata panel if necessary

Data processing: DataFabrique

Data Visualization

Data visualization is done either

  • from the Steps - Operations tab

  • from the Output Data tab

information Note that from each step, it is possible to visualize the data. In this case, all the steps located upstream will be taken into account.

Execution of processes

  • 1 Executes processing from step 1 (initialization)

  • 2 Executes processing from step 1 + step 2

  • 3 Executes the treatments of step 1, step 2 and step 3

  • 3 Executes the processing of all stages: stage 1, stage 2 and stage 3

Data processing: The steps

The data processing of a DataBlock is performed in the Steps - Operations tab by creating processing sets named "Steps".

A step can consist of several types of operations on the columns. The number of steps that can be generated is not limited.

The steps of a DataBlock are created and executed consecutively. Thus each runs from the result returned by the previous one.

Creating a step

Adding a step
  • From the Steps - Operations tab of the DataBlock

  • Click on icon plus_small

    information A step can be added either

    • end of line

    • between two stages

In this case, position yourself on a step and use the plus_small icon. The new step will be added to the right of the selected step.

Changing the label of a step

stepinfo

Each step has a label and a description. To access this function, click on the icon tag.svg located in 1

Adding operations in a step

To access the list of available functions, click on the step level on the ellipsis-v icon.

The "DataBlock link" processing of a step is used to add to the current DataBlock the values of the columns of another DataBlock or of another Business entity.

To do this, a link will be created between the two DataBlocks or between a DataBlock and an Business entity. This link will only exist within the DataBlock where it was created.

  • n2 Click on the icon ellipsis-v

  • n2 Choose option Links DataBlocks

  • The icon Join icon then appears at the step level.

Add step

  • Click on the Join icon icon, in the step, or in the menu to access the join configuration window

Join settings

1 Click on the Add a link button.

2 Enter the label for the link being created.

3 Click on the icon Search to open the dialog box for selecting the DataBlock with which to establish the link

4 Click on the DataBlock or the Business entity to choose The search field located at the head of the column can help to find it using the keywords contained in its label.

5 Save to close window

6 Choose the characteristics whose values will be added to the data table of the current DataBlock following the creation of the link

All that remains is to define the link to be established between the data blocks using one or more joins between their respective columns. The columns of a join must be of the same type (Character string, Numeric, Date) to be linked.

7 Choose link specification (Left, Inner, Right)

8 Choose the characteristic of the current DataBlock (source) to link

9 Choose the type association criterion (operator) of the characteristics

10 Choose the characteristic of the other data block constituting the link (target)

eleven It is sometimes necessary to define several link criteria. The New criterion button allows you to add as many criteria as necessary. The multiple joins of a link are cumulative (associated by logical "AND"). Thus, the data resulting from the link will respect each of the conditions expressed by the joins.

treize Validate the previous actions by clicking on the Save button.

The process is ready to be executed and its result is displayed.

Note that once the link has been created, an icon Statistics icon allowing analysis of the performance of the link is available to the right of the label of the link Statistical details of the join

Unions and intersections

The union consists of adding the values of a third-party DataBlock or of an Business entity to the columns of the data table of the current DataBlock to end up with a table containing * all the lines* of each of the two data blockss.

The intersection consists in keeping only the rows present at the same time in the current DataBlock and the third data block.

  • 1 Click on the icon ellipsis-v

  • 2 Choose option Unions / Intersections

  • The Union icon icon is then positioned at the step level.

Add union

  • Click on the icon union in the menu or in the step

Add union to DataBlock

1 Click on the icon plus_small.svg

2 Enter the label that will carry the union or intersection operation being created.

3 Click on the icon Search to open the dialog box for selecting the data block ( DataBlock or Entity Type) to link to

4 Click on the DataBlock or the Business entity to choose The search field located at the head of the column can help to find it using the keywords contained in its label.

5 Save to close window

6 Choose the operation to perform (Union or Intersection).

7 Choose the columns of the third (or target) data block (DataBlock or Entity Type) to associate with those of the current DataBlock using the lists drop-down boxes made available.

10 Validate the previous actions by clicking on the save button.

Attention Attention, each column of the current DataBlock must be associated with a column of the target data block (DataBlock or Entity Type) to obtain results in case of intersection. Indeed, otherwise, the intersection between the DataBlocks will return an empty array.

The process is ready to be executed and its result displayed.

Formulas

A step can include processing consisting in creating new columns in the data table of a DataBlock from constants and the values of its other columns. This is the Formulas functionality.

The number of Formulas that can be created per step is not limited.

1 Click on the icon ellipsis-v

2 Choose option Formulas

The DataBlock <a href=Formulas">,29,29 icon is then positioned at the step level.

imageFormule

  • Click on the icon Formules DataBlock in the menu or the step to open the dialog box for adding a formula to the current step:

Enter Formula

1 The New formula Button allows to start the creation.

2 Sets the formula label.

information Note that the label of the formula will also be used for the label of the new generated column.

3 Area for entering and building the formula.

A formula can be Simple or Complex (cascading formula).

The structure of a formula is as follows:

  • A name

  • Attributes defining the inputs allowing the operation to be carried out.

An attribute maybe A column available in the step A manual entry A choice in a closed list Another formula

Example: Extraction of the first two characters of the postal code to create a new column department.

  • Step 1: Click on the "New Formula" button

  • Step 2: Label: Department

  • Step 3: Click in the formula input area

Formula search

1 Find the formula str.extract (either using the list scroll bar or writing the letters "ext"

2 Choose the formula str.extract

Click on the formula to select it (closes the dialog box) Note that the user can display help on a formula by hovering over it The formula appearsthen print in the formula area:

Creation of a formula

The formula str.extract requires 3 input arguments (quotes).

1 The value to process 2 The direction of the extraction 3 The number of characters to extract

To specify each of the attributes, click on the quotation mark in question to open a dialog box. To modify the choice made, click again on the argument to open the dialog box again.

Adding formula arguments

information Help on the formula is visible on the right: it specifies the current argument (blue background) and the type of argument expected (in brackets)

Argument 0: The value to process

  • A String is expected as the value to be processed, in the example, select the "Common" column 1

Argument 1: The meaning of the extraction

  • A String is expected as the value to be processed, in the example you must select "Left" 2

Argument 2: The number of characters to extract

  • A String is expected as the value to be processed, in the example it is a manual entry: write the number 2 in the input area 1

The formula is complete.

Validate formula

Click on 1 to validate it.

information To quickly find the formula, it is possible to save it to the catalog by clicking on the icon 2.

The formula has been added to the current step and is active.

Formula actions

1 Enables or disables the formula 2 Click to edit formula 3 Click to duplicate the formula 4 Click to remove formula 5 To close the formula manager and return to the DataBlock, click on return Return

Extract_4Formule.png

The processing is ready to be executed 1 and its result, displayed 2

information Detailed documentation on formulas is available in the "Formulas" dialog box. Cf Documentation list of Formulas

Filters

The filter is an operation consisting of retaining from the data table of a DataBlock only the rows whose values respect certain conditions.

  • Click on the icon ellipsis-v

  • Choose the option Filters

  • The icon Filter icon is then positioned at the level of the current step

Adding filter to step

  • Click on the icon Filter icon in the menu or in the step to open the dialog box for adding a filter to the step current :

Filter Settings

1 List of available columns that can be integrated as Filter criteria

2 Click on the arrow to include a column as a filter criterion

3 List of operators available for a criterion. Note that the list of operators is different depending on the type of column (Character string, Numeric, Date).

4 Action area on criteria and groups of criteria. The icons allow

  • to delete a criterion

  • to add a group of criteria

  • to delete a group of criteria

    information Note that criteria can be moved from group to group by Drag&Drop

5 Enables or disables the filter for this step

information In this case where it is inactivated, the filter is no longer taken into account during the execution of the step but it remains available. If a filter is inactivated, the step’s filter icon is red

6 Saves the settings made

Details of placing a filter on multiple columns

Configuration of filters on multiple columns

1 Click on the arrow to add a column as criteria

2 If the criteria are positioned in groups, the groups can be linked together by an AND or an OR. A click on the area allows you to change the value (OR to AND) and (AND to OR)

3 Areas used to enter the value of the criterion. The criteria must be consistent and in phase with the type of the queried column. Depending on the operator, two criteria can be provided

4 List of operators available for a criterion.

information Note that the list of operators is different depending on the column type (String, Numeric, Date).

5 Adds a new criteria group to the first group

6 Adds a new criteria group to the second group

7 Remove Criteria Group

8 Remove criterion

9 Opens a dialog box to search for values in the columns

10 Opens a dialog to select a column as value for the criterion

11 To add a criterion, select it with the blue arrow 1 then drag and drop into the desired group with a long click on up

Once the filters have been edited, click on the validate button to save them.

The process is ready to be executed and its result is displayed.

Its result will only be visible in the data table after loading the step or the entire DataBlock (and therefore all its steps successively).

Operations

The process called "Operations" allows you to execute three additional functions on the data: Stack, Explode and Redimension

  • The Stack function consists in stacking on several lines values previously gathered on a single line. A new column is then created to display these values. This function is configured by indicating the column headers of the values to be stacked as well as the number of rows on which to display them.

  • The Explode function allows you to spread over several lines the elements of list type values.

  • The Redimension function allows you to perform an inverted pivot

    • Resize by column group

    • Resize by column

Steps

1 Click on the icon ellipsis-v

2 Choose option Operations

  • The Formulas DataBlock icon is then positioned at the level of the current step.

imageOpe

  • Click on the Formules DataBlock icon in the menu or in the step

To open the dialog for adding a filter to the current step:

  • Choose from the list of possible operations Operation.

  • Select the DataBlock formulas icon to open the dialog box for managing Stack and Explode and Redimension processing.

NOTE

A detailed documentation on operations is available in the Operations page

Partition / Aggregation

The Partition / Aggregation processing aims to reorganize the data by joining them into groups. It also ensures the execution of statistical functions (average, effective, maximum, minimum) and descriptive (list of group values) on these groups.

This treatment is available in 3 versions:

  • Aggregation Simple

  • The multidimensional aggregation

  • Operation on columns or vertical formulas

To access aggregate functions

  • Click on the icon Formules DataBlock

  • Choose the option Partitions / Agrega/ Pivots

  • The Aggregations icon is then positioned at the step level.

imageAgre

  • Click on the icon Formules DataBlock

information A detailed documentation on aggregations is available in the Aggregations popup

Tris

The Tris process is used to manage the display order of the data in the table of values of the DataBlock.

  • Click on the icon ellipsis-v

  • Choose the option Tris

  • The icon Sort icon is then positioned at the step level.

Tris on Datablock

  • Click on the icon Sort Icon to open the following dialog box for sorting the values of the current data table.

Creating a sort

1 Use the drag handle accompanying the items of the Attributes panel (columns of the data table) to drag and drop them into the Sorted columns panel in view to define a sort order.

2 Set Ascendant (smallest to largest) or Descendant (opposite) sorting of column values in the Columns sorted panel to using their drop-down list.

3 Once the sort parameters have been defined as desired, click on the Validate button to validate and save these modifications.

The configured aggregations will only be visible in the data table after loading the step or the entire DataBlock (and therefore all its steps successively).

Step output

The Step exit function is used to manage the presentation of the columns at the step exit. During this operation, the order of the columns can be changed, the label of the columns can be modified and columns can be inactivated (or activated) to manage their visibility.

Attention A column inactivated in a step is no longer available in the following step

  • Click on the icon Icosortie

  • The Icooutput icon is always present in the panel of each step.

step output

1 List of columns available for output. It is possible to change the order of the columns. see sept

de Allows to give an Alias to the column

three Option that allows you to change the column type. Note that for certain conversions such as dates and decimals, a reading format must be specified

quatre For date types, allows you to specify a date display format. Note that a verification of the integrity of the value chain is carried out.

five Handling of null values. If this option is activated, it is then possible to enter a value replacing null values

six Allows you to specify whether the column concerned will be available or not in the next step.

informationNote that a verification of the integrity of the value chain is performed.

sept Changed the order of the columns. Position yourself on a line then Drag & Drop at the desired level.

eight One column search area.

nine Accesses a screen presenting the origin of the current column.

10 Saves the changes made.

Delete step

To delete a step, click on the suppression button that accompanies it.

Attention Attention: after clicking on the delete button of a step, there is no message

information Re-mapping feature is available at boot level

Column statistics

Statistics on a column

Access statistics window

1 Click on 1 to access the column statistics popup

stat for a column

1 Refresh column statistics 2 Click to show count of distinct values in column - May take a while to show 3 Click to display the number of approximate distinct values in the column - Faster display on previous 4 Click to display statistics on the most frequent values of the column

stat for a column

Statistics on all columns

Access setting global statistics

1 It is possible to see the statistics for each step: click on 1 to display the menu for choosing the statistics to be to display

Display global stats of a step

1 Click to show menu 2 Choose statistics to display 3 Click on Play to execute the step 4 Representation of the amount of "Null" values in the column. On hover, displays the precise statistics in number and percentage. 5 Hover over to display column statistics

Re-mapping - Changing the element that feeds a Datablock (or a HandleData source)

  • The initialization step of the data sources of the HandleData presentations

  • The DataBlocks initialization step of the GenericsData module

The purpose of this functionality is to be able to modify the element (Business Entities or DataBlock) that feeds a DataBlock or a HandleData presentation source.

To perform this operation, two steps are required

A – At the initialization stage, use the Mapping button to access the screen allowing you to choose a new source.

B – Once the new source has been chosen, it is necessary to carry out the mapping between the columns of the new source and the columns already available in the current data block.

An automatic recognition makes it possible to propose a default mapping. This recognition is done on the correspondence of the labels and the type of the columns.

information Note that to be mapped, the linked columns must be of the same type.

Example of changing the initialization source of a DataBlock

Datablock Initialization Access

1 The DataBlock-1 contains 3 steps, the first of which is the initialization step (gray area at the beginning of the Pipeline) 2 Access the DataBlock initialization screen by clicking on init datablock icon

Datablock initialization screen presentation

1 Indicates the columns of the original element that feed the DataBlock 2 Indicates the Entity used to feed the DataBlock 3 Indicates the additional characteristic added to the Repository linked to the initial Entity created with it 4 To change the source that feeds Datablock 1 to another data source, use the Mapping button located at the top right of the initialization screen . The remap popup opens:

remaping DataBlock

1 Click to search for a new source to map. This new source may be:

  • either a business entity

  • or another DataBlock

2 Click on the line of the new source 3 Click on Save to validate and close the window

re-3.png

Once the new source has been chosen (here My Source 2), the columns bearing the same label AND which are of the same type will be mapped automatically. Other columns can be mapped manually if needed.

information Notethat it is possible by using the + button to add a new column to the current DataBlock allowing to create a new value fed by a column from the new source

re-4.png

  • Matches are made between columns.

  • Validate the change with the button located at the top right of the screen.

  • Re-mapping is complete.

The Datablock is then fed by the new source.

Editing a DataBlock

  • From the list of DataBlocks from the GenericsData module Logo GenericsData

  • Click on icon DataBlockIco.svg DataBlock

  • Search for the target DataBlock

  • At the level of the targeted DataBlock line, click on the icon edit or on the label of the DataBlock

Temporary state of a DataBlock during a work session

When editing a DataBlock, the current modifications are now saved in a temporary space.

If the user quits editing the DataBlock, a temporary save of the state is made.

In edition of the Datablock, the temporary state is edited and the user finds the state of the temporary Datablock.

Attention Attention, the temporary recording is not a permanent recording. Only the action Save performs a definitive recording.

A Refresh button allows you to return to the state of the last definitive recording made

  • either by the current user

  • either by another user

information The presence of the Refresh button indicates to the user that the current edition is an edition of a temporary state.

Clicking on the Refresh button refreshes the Datablock during the last definitive save. A message indicates the date, time and login of the user who made the last recording

Removing a DataBlock

An impact check is performed when deleting a DataBlock. Depending on the impact on the value chain, DataChain may block deletion.

Deletion of a DataBlock is done

  • Either from the list of DataBlocks

    • At the level of the targeted DataBlock line, click on the suppression icon to perform the deletion.

  • Either in edition of a DataBlock

    • Use the six button located on the top left banner.