Repositories
General - Filing
The Repository function of DataChain is accessible from the GenericsData module.
The number of repositories that can be created is not limited.
Repository allows to define a way to read data from a connector (Local, DB, or other).
Each connector must be linked to at least one repository.
In the case of rapid integration, only repository is created : you will not find any "Local" connector.
The repository represents Level 2 of the DataChain value chain.
This function is essential for data consumption in DataChain.
The repository is always associated with an connector .
The connector defines how data is read.
Depending on the connector, the reader types can be varied.
A deposit feeds one or more Business Entities .
Creating a Repository
List of Existing Repositories
A repository is created from the GenericsData. module.
-
Access module GenericsData.
-
Choose, in the left menu GenericsData, the option Repositories associated with the icon.
Metadata
-
Click on the button.
.
-
Each Repository has a metadata panel. Entering a label is required.
_Optional input fields allow you to provide additional information. An icon can thus be assigned to the repository via the commands present in its metadata_panel.
It is advisable to save this panel as soon as you enter it.
Use the button
located in the right part of the top banner of the screen.
Choice of a connector
Two main types of connector are available in the DataChain offer
Local connector (or connectorless mode)
DataChain embeds a connector in its base deployment. It allows to integrate data without the need to create a connector.
Note that when using a local connector, the data will be physically stored in the DataChain context.
To use the local connector, click on the option without connector .
External Connector
To use the external repository mode, click on the option button
The external deposit mode option forces to specify an already existing connector in DataChain.
To choose an connector , use the
choice box.
The list proposes all authorized connectors.
Note that when using an external connector, the data will be physically outside the *DataChain context *.
As a reminder, here are the types of connectors present in DataChain:
-
Local: without configuration, native and not accessible from connector management.
-
SFTP *HTTP
-
HTTPS
-
S3 (AWS)
-
Database Sql and NoSql
-
HDFS
-
ElasticSearch
-
…
Depending on the connector chosen, the repository settings may vary.
Deposit Types - Setup
Connectors: Local Connector
File with separator
-
Text Identifier: Specifies the character that is used as an escape character.
-
Separator: Specifies the character that is used as the separator character.
-
Encoding: Indicates the computer character format (encoding) used by the file to be processed in order to take special characters into account. Islandst positioned by default in UTF-8. This value can be changed.
-
Reading mask: Indicates the file reading mask. Only the files corresponding to the reading mask will be taken into account.
-
Reading modes: 3 possible modes
-
PERMISSIVE: Scans all rows: NULL values are inserted in place of missing values and extra values are ignored.
-
DROPMALFORMED: Drops rows with fewer or more values than expected or values that don’t match the pattern.
-
FAILFAST: Aborts with a RuntimeException if a malformed row is encountered
-
-
Header: Indicates whether the first line contains the column headers.
-
Multilines: Option used to manage the case of files containing line breaks in a column.
Parquet file
-
Reading mask: Indicates the file reading mask. Only the files corresponding to the reading mask will be taken into account.
Json file
-
Reading mask: Indicates the file reading mask. Only the files corresponding to the reading mask will be taken into account.
-
Encoding: Indicates the computer character format used by the file to be processed in order to take special characters into account. It is automatically detected, by default.
-
Multilines: Indicates whether the Json file contains n string Json (MultiLine to YES) or if the file contains a single Json structure
-
Json Path: Determines the level for header detection
-
Explode(s): Indicate if one (or more) of explode operations must be performed at the JsonPath level (1 by default)
XML file
-
New Line Tag: The line tag of your xml files to be treated as a line.
-
Reading modes: 3 possible modes
-
PERMISSIVE: Scans all rows: NULL values are inserted in place of missing values and extra values are ignored.
-
DROPMALFORMED: Drops rows with fewer or more values than expected or values that don’t match the pattern.
-
FAILFAST: Aborts with a RuntimeException if a malformed row is encountered
-
-
Encoding: Indicates the computer character format (encoding) used by the file to be processed in order to take special characters into account. It is positioned by default in UTF-8. It is possible to change this value.
-
Ignore spaces before or after data: Indicates whether white spaces around read values should be ignored. The default is No
-
Treat empty values as null values: Indicates whether the space character should be treated as a null value. The default is No
-
Reading mask: Indicates the file reading mask. Only the files corresponding to the reading mask will be taken into account.
Excel
-
The address of the data: Indicates the workbook and the area in the Excel file which must be read. Example My Sheet!A1:K225
-
Workbook password: If the Excel file is protected by Password, it is mandatory to specify it in this input area.
-
Reading mask: Indicates the file reading mask. Only the files corresponding to the reading mask will be taken into account.
Warning headers of numeric type columns (which may come from formulas in excel) are not accepted and generate an error during integration
Connectors: SFTP, HDFS and S3
File with separator
-
Text Identifier: Specifies the character that is used as an escape character.
-
Separator: Specifies the character that is used as the separator character.
-
Encoding: Indicates the computer character format used by the file to be processed in order to take special characters into account. It is automatically detected, by default.
-
Reading mask: Indicates the file reading mask. Only the files corresponding to the reading mask will be taken into account.
-
Reading modes: 3 possible modes
-
PERMISSIVE: Analyzes all rows: NULL values are inserted instead ofmissing values and extra values are ignored.
-
DROPMALFORMED: Drops rows with fewer or more values than expected or values that don’t match the pattern.
-
FAILFAST: Aborts with a RuntimeException if a malformed row is encountered
-
-
Header: Indicates whether the first line contains the column headers.
-
Multilines: Option used to manage the case of files containing line breaks in a column.
-
Path: Indicates the location of the files to be processed.
Parquet file
-
Reading mask: Indicates the file reading mask. Only the files corresponding to the reading mask will be taken into account.
-
Path: Indicates the location of the files to be processed.
Json file
-
Reading mask: Indicates the file reading mask. Only the files corresponding to the reading mask will be taken into account.
-
Encoding: Used to indicate the computer character format used by the file to be processed in order to take special characters into account. It is automatically detected, by default.
-
Multilines: Indicates whether the Json file contains n string Json (MultiLine to YES) or if the file contains a single Json structure
-
Path: Indicates the location of the files to be processed.
-
Json Path: Indicates the level for header detection
-
Explode(s): Indicates whether one or more explode operations must be performed at the JsonPath level (1 by default)
XML file
-
New Line Tag: The line tag of your xml files to be treated as a line.
-
Reading modes: 3 possible modes
-
PERMISSIVE: Scans all rows: NULL values are inserted in place of missing values and extra values are ignored.
-
DROPMALFORMED: Drops rows with fewer or more values than expected or values that don’t match the pattern.
-
FAILFAST: Aborts with a RuntimeException if a malformed row is encountered
-
-
Encoding: Indicates the computer character format (encoding) used by the file to be processed in order to take special characters into account. It is positioned by default in UTF-8. It is possible to change this value.
-
Ignore spaces before or after data: Indicates whether white spaces around read values should be ignored. The default is No
-
Treat empty values as null values: Indicates whether the space character should be treated as a null value. The default is No
-
Reading mask: Indicates the file reading mask. Only the files corresponding to the reading mask will be taken into account.
-
Path: Indicates the location of the files to be processed.
Excel
-
The address of the data: Indicates the workbook and the area in the Excel file which must be read. Example My Sheet!A1:K225
-
Workbook password: If the Excel file is protected by Password, it is mandatory to specify it in this input area.
-
Reading mask: Indicates the file reading mask. Only the files corresponding to the reading mask will be taken into account.
Warning headers of numeric type columns (which may come from formulas in excel) are not accepted and generate an error during integration
-
Path: Indicates the location of the files to be processed.
Connectors: Http / Https / REST
File with separator
-
Text Identifier: Specifies the character that is used as an escape character.
-
Separator: Allows you to define the character that is used as the separator character.
-
Encoding: Indicates the computer character format used by the file to be processed in order to take special characters into account. It is automatically detected by default.
-
Reading modes: 3 possible modes
-
PERMISSIVE: Attempts to parse all rows: NULL values are inserted in place of missing values and extra values are ignored.
-
DROPMALFORMED: Deletes rows containing less or more values than expected or values that do not match the pattern.
-
FAILFAST: Aborts with a RuntimeException if a malformed row is encountered
-
-
Header: Indicates whether the first line contains the column headers.
-
Multilines: Option used to manage the case of files containing line breaks in a column.
-
Method: Method to apply GET or POST
-
URI: Specifies the URL that will be consumed by the Http / Https connector. Use the magnifying glass located at the end of the line to achieve a more structured entry of the URI using a URI Parser function.
-
header: Allows to generate keys - Values for the header
-
Body: For the POST method, allows you to specify the Body
Parquet file
-
Method: Method to apply GET or POST
-
URI: Indicates the URL that will be consumed by the Http / Https connector
-
header: Allows to generate keys - Values for the header
-
Body: For the POST method, allows you to specify the Body
Json file
-
Encoding: Indicates the computer character format used by the file to be processed in order to take special characters into account. It is automatically detected by default.
-
Method: Method to apply GET or POST
-
URI: Indicates the URL that will be consumed by the Http / Https connector
-
header: Allows to generate keys - Values for the header
-
Body: For the POST method, allows you to specify the Body
-
Multilines: Indicates whether the Json file contains n string Json (MultiLine to YES) or if the file contains a single Json structure
-
Json Path: Indicates the level for header detection
-
Explode(s): Indicates whether one or more explode operations must be performed at the JsonPath level (1 by default)
XML file
-
New Line Tag: The line tag of your xml files to be treated as a line.
-
Reading modes: 3 possible modes
-
PERMISSIVE: Scans all rows: NULL values are inserted in place of missing values and extra values are ignored.
-
DROPMALFORMED: Drops rows with fewer or more values than expected or values that don’t match the pattern.
-
FAILFAST: Aborts with a RuntimeException if a malformed row is encountered
-
-
Encoding: Indicates the computer character format (encoding) used by the file to be processed in order to take special characters into account. It is positioned by default in UTF-8. It is possible to change this value.
-
Ignore spaces before or after data: Indicates whether white spaces around read values should be ignored. The default is No
-
Treat empty values as null values: Indicates whether the space character should be treated as a null value. The default is No
-
Reading mask: Indicates the file reading mask. Only the files corresponding to the reading mask will be taken into account.
-
Method: Method to apply GET or POST
-
URI: Indicates the URL that will be consumed by the Http / Https connector
-
header: Allows to generate keys - Values for the header
-
Body: For the POST method, allows you to specify the Body
Excel
-
The address of the data: Indicates the workbook and the area in the Excel file which must be read. Example My Sheet!A1:K225
-
Workbook password: If the Excel file is protected by a Password, it is mandatory to specify it in this input area.
-
Reading mask: Indicates the file reading mask. Only the files corresponding to the reading mask will be taken into account.
-
Method: Method to apply GET or POST
-
URI: Indicates the URL that will be consumed by the Http / Https connector
-
header: Allows to generate keys - Values for the header
-
Body: For the POST method, allows you to specify the Body
Warning headers of numeric type columns (which may come from formulas in excel) are not accepted and generate an error during integration
Advanced settings Deposit on Http/Hhtps connector (GET and POST)
-
General Settings (some settingsdepend on the expected return type (CSV, JSON, etc…)
-
Allows to add key-values in the header of the request
-
Iteration management
Iterations by Offset
-
Variable automatically added to URI
-
Waiting time between each iteration
-
Read start line
-
Number of line(s) read at each iteration
-
Number of iteration(s)
-
Content in body returned by 3rd party for read completion (with or without REGEX)
-
Content in the header returned by the third party for the end of reading (with or without REGEX)
Iterations per Page
-
Variable automatically added to URI
-
Waiting time between each iteration
-
Reading start page
-
End of reading page
-
Number of iteration(s)
-
Content in body returned by 3rd party for read completion (with or without REGEX)
-
Content in the header returned by the third party for the end of reading (with or without REGEX)
Example of configuration with the POST method: 4 iterations maximum with a skip of 50
Save settings
-
Once the settings have been made, click on the button
Following recording and in order to deduce the headers available in the read file, it is mandatory to perform the Synchronization of headers action.
Click on the
button located in the
area to perform this operation.
The synchronization of the headers must be carried out after the creation of the repository.
Functions available for deposit management
Connector and drive definition area.
Player settings area. These parameters vary according to the connector and reader used.
Business Entities linked tab: lists all the Business Entities that consume the repository.
Area containing the functions available for a repository.
Headers tab: contains the file reference headers.
Remote Files tab: allows you to view the file(s) present in the repository.
- Remote File Actions
-
-
View: Click on the "Magnifying glass" icon located at the end of the line
-
Download: Click on the "Download" icon located at the end of the line
-
Extractions tab: allows you to perform time-stamped extractions of source values.
The number of extractions is not limited.
Note that the extractions performed can be consumed by the data blocks.
The Extractions function can be used as a historization function.
Filters tab: allows you to generate filters (which will be applied at the level of the link between the Business Entity and the repository) on the extractions in order to be able to partially exploit them in data blocks.
Example: The last 3 extractions, the extractions of the last 10 days,…
Mandatory action: synchronize the headers.
A repository can supply n Business Entities.
The number of Business Entities that can be fed by the same repository is not limited.
From the list of Business Entities, it is possible to create an Business Entity.
In this case, the new Entité Métier will be initialized with the headers of the repository.
File Explorer
The explorer is available for Repositories linked to HDFS, MiNio, S3 and SFTP type Connectors.+ The function makes it easy to view and define the path to the directory containing the remote files to be integrated.
To explore the remote files, click on the magnifying glass located on the "Path" line.
Editing a Repository
-
Access to the GenericsData module
-
Left Menu Bar
-
Menu Choice Deposit
-
Search in the repository list
The lists of the elements of the DataChain offer have filter and search functions on columns. Use these functions to find the target repository.
-
Click on the label of the chosen repository or on the icon
at the end of the line.
-
Deleting a Repository
-
Access to the GenericsData module
-
Left Menu Bar
-
Menu Choice Repositories
-
Search in the repository list
The lists of the elements of the DataChain offer have filter and search functions on columns. Use these functions to find the target repository.
-
Option 1 Use the
icon button at the end of the line and confirm the action.
-
Option 2
-
Click on the label of the chosen repository or on the icon
at the end of the line.
-
Once the edit repository page is displayed, click on the
button then confirm the action.
-
Quick Reference
Creation of a Deposit
Access GenericData module
Steps | Objective | Stock | Landmarks |
---|---|---|---|
1 |
Access to the list of repositories |
Click on deposit icon |
|
2 |
Creating a new repository |
Click on New icon |
|
4 |
The metadata |
Enter information |
Description Required |
5 |
Choosing a connector |
Choose from the list of available Connectors or Use local repository |
|
6 |
To register |
Click on Save button |
|
7 |
Definition of parameters |
Entering settings information |
|
8 |
Synchronize data structure with repository |
Click on the button |
|
9 |
To register |
Click on Save button |
|