Unions (Set Operations)

1. General Information

In each step of a DataBlock Logo GenericsData, you can apply a Union to combine two datasets – whether they originate from Entities or other DataBlocks.

This mechanism allows merging, intersecting, or comparing homogeneous sets for analysis, filtering, or restructuring of data flows.

Examples:

  • Merge setA and setB to obtain all their rows.

  • Identify common rows or, conversely, different rows between two sets.

The combined columns must have the same number and the same types (e.g., impossible to merge a text column with a numeric column). SetB cannot have more columns than setA.
The feature allowing identification of different rows between two sets is only available since version 8.9.0.

2. Principles

Configuring a Union involves:

  1. Selecting the target DataBlocks/Entities

  2. Choosing the operation type (see next section).

  3. Checking the schema compatibility (types & column order).

The results are injected into the current DataBlock, column by column, in the defined order.

3. Types of Operations

The subsections each describe the four supported operations, illustrated using the two example DataBlocks:

DataBlockA: “Patients”
No First Name Last Name Age

001

Harry

Potter

16

002

John

Wick

42

002

John

Wick

42

003

James

Bond

45

DataBlockB: “NewPatients”
No First Name Last Name Age

001

Harry

Potter

16

004

Emmett

Brown

63

005

Geralt

DeRiv

null

3.3. UNION (UNIONALL)

Purpose: Merge A and B by stacking all rows, without automatic deduplication.

No First Name Last Name Age

001

Harry

Potter

16

002

John

Wick

42

003

James

Bond

45

004

Emmett

Brown

63

005

Geralt

DeRiv

null

Note: equivalent to SQL UNIONALL. Potential duplicates are preserved.

3.4. INTERSECTION

Purpose: Retain only rows present in both A and B (comparison based on all columns).

No First Name Last Name Age

001

Harry

Potter

16

Behavior: Only one instance of each common row is returned (even if duplicated in the sources).

3.5. SUBTRACTION (EXCEPT)

Purpose: Return rows from DataBlockA that are absent from DataBlockB or differ by at least one value.

No First Name Last Name Age

002

John

Wick

42

003

James

Bond

45

Duplicates are removed in the result, consistent with Spark behavior for the except function.

3.6. SUBTRACTION WITH DUPLICATES (EXCEPTALL)

Purpose: Same logic as Subtraction, but preserves the exact number of occurrences remaining after comparison.

No First Name Last Name Age

002

John

Wick

42

002

John

Wick

42

003

James

Bond

45