Unions (Set Operations)
1. General Information
In each step of a DataBlock , you can apply a Union to combine two datasets – whether they originate from Entities or other DataBlocks.
This mechanism allows merging, intersecting, or comparing homogeneous sets for analysis, filtering, or restructuring of data flows.
Examples:
-
Merge setA and setB to obtain all their rows.
-
Identify common rows or, conversely, different rows between two sets.
The combined columns must have the same number and the same types (e.g., impossible to merge a text column with a numeric column). SetB cannot have more columns than setA. |
The feature allowing identification of different rows between two sets is only available since version 8.9.0. |
2. Principles
Configuring a Union involves:
-
Selecting the target DataBlocks/Entities
-
Choosing the operation type (see next section).
-
Checking the schema compatibility (types & column order).
The results are injected into the current DataBlock, column by column, in the defined order.
3. Types of Operations
The subsections each describe the four supported operations, illustrated using the two example DataBlocks:
DataBlockA: “Patients”
No | First Name | Last Name | Age |
---|---|---|---|
001 |
Harry |
Potter |
16 |
002 |
John |
Wick |
42 |
002 |
John |
Wick |
42 |
003 |
James |
Bond |
45 |
DataBlockB: “NewPatients”
No | First Name | Last Name | Age |
---|---|---|---|
001 |
Harry |
Potter |
16 |
004 |
Emmett |
Brown |
63 |
005 |
Geralt |
DeRiv |
null |
3.3. UNION (UNIONALL)
Purpose: Merge A and B by stacking all rows, without automatic deduplication.
No | First Name | Last Name | Age |
---|---|---|---|
001 |
Harry |
Potter |
16 |
002 |
John |
Wick |
42 |
003 |
James |
Bond |
45 |
004 |
Emmett |
Brown |
63 |
005 |
Geralt |
DeRiv |
null |
Note: equivalent to SQL UNIONALL. Potential duplicates are preserved. |
3.4. INTERSECTION
Purpose: Retain only rows present in both A and B (comparison based on all columns).
No | First Name | Last Name | Age |
---|---|---|---|
001 |
Harry |
Potter |
16 |
Behavior: Only one instance of each common row is returned (even if duplicated in the sources).