Loading Metadata and Documentations
When configuring and loading information into Engrafo it is highly recommended to have a way to automate the creation of data catalogs and data usage documentation.
The goal is to minimize the manual efforts needed to document data and data usages.
One of the key elements in Engrafo is an API that allows you to extract all kinds of information, place that in a folder and simply let Engrafo import and store the information. In that way, you can schedule your extracts and let Engrafo fetch, structure and storage all the information.
But, the extract you make and store must match the structure of the API. We have explained how this is done in the following section.
On this page of the Engrafo Guide you will find information about:
- 1 The Overall Process for Uploading Metadata
- 2 Generating Data Catalog via .CSV Load
- 3 Updating Data Catalog via .CSV Load
- 4 Generating Data Usage Documentations via .CSV Load
- 5 Generating Sample Space via .CSV Load
- 6 Generating Data Glossary via .CSV Load
- 7 Generating Data Catalog via Server Connection String
If you have questions about alternative methods to automatically load metadata and information into Engrafo, please reach out to us at contact@engrafo.dk
The Overall Process for Uploading Metadata
Metadata load is the core for Engrafo making documentations and data catalogs automated
The overall process goes: Upload files → validate files → load files
For Uploading of metadata files, it can be done in 3 ways
Manual: You will have to select the file from your computer and upload it manually
Semi-automated: All metadata files placed in wwwroot\uploads_csv\,\ are automatically imported and are ready for validation. Note that the csv-delimiter that is used is the name of the folder. Here a comma should be used: “,”
Fully-automated. All metadata placed in wwwroot\uploads_CSVAuto\;\ are automatically imported validated and loaded to Engrafo. Note that the csv-delimiter that is used is the name of the folder. Here a semicolon should be used: “;” There is a check every 5 sec. on new files to import, validate and load. (If validation fails, and error messages on that metadata-file is produced and metadata is not imported)
Important: when placing metadata in a folder, it is the subfolder name which identifies the delimiter that is used:
For instance, if metadata is delimited with a “¤”, then place metadata in uploads_csv\¤\
Generating Data Catalog via .CSV Load
The following is a description on how to create and upload a metadata file for creating a Data Catalog in Engrafo. It’s very important to hold the right syntax.
Mandatory columns
Load a csv file that has the following required columns:
Category
Database
Schema
Table
Column
Additionally, the csv file may have the following optional columns:
If the columns are to be sorted differently than alphabetically
ColumnSort
If custom fields has to be imported
Database_[fieldname]
Schema_[fieldname]
Table_[fieldname]
Column_[fieldname]
IMPORTANT
The [fieldname] fields must exist as Custom fields before it is possible to load content into them.
The following is an example of what a CSV file might look like where the character ¤ is selected as the separator:
Category¤Database¤Schema¤Table¤Column¤Database_Description¤Schema_Description¤Table_TableOwner¤Column_Description¤Column_Owner
Stage¤DB01¤Schema_A¤Table01¤Column01¤Important database¤Schema-Description¤Knus¤Den 1. Column-Description¤Peter Poulsen
Stage¤DB01¤Schema_A¤Table01¤Column02¤Important database¤Schema-Description¤Jens¤Den 2. Column-Description¤Maria Hansen
Stage¤DB01¤Schema_A¤Table01¤Column03¤Important database¤Schema-Description¤Mads¤Min 3. Column-Description¤Ole Jensen
Stage¤DB01¤Schema_A¤Table01¤Column04¤Important database¤Schema-Description¤Ebbe¤Min 4. Column-Description¤Inge Madsen
Stage¤DB01¤Schema_A¤Table01¤Column05¤Important database¤Schema-Description¤Gorm¤Min 5. Column-Description¤Jim JensenUpdating Data Catalog via .CSV Load
If you wish to update your Data Catalog it must contain the following:
Mandatory columns
Load a csv file that has the following required columns:
CategoryID
DatabaseID
SchemaID
TableID
ColumnID
Additionally, the csv file may have the following optional columns:
If names need to be updated
Category
Database
Schema
Table
Column
If the columns are to be sorted differently than alphabetically
ColumnSort
If custom fields need to be updated
Database_id_[fieldname]
Database_[fieldname]
Schema_id_[fieldname]
Schema_[fieldname]
Table_id_[fieldname]
Table_[fieldname]
Column_id_[fieldname]
Column_[fieldname]
IMPORTANT
The [fieldname] fields must exist as Custom fields before it is possible to load content into them.
The following is an example of what a CSV file might look like where the character ¤ is selected as the separator:
CategoryID¤DatabaseID¤SchemaID¤TableID¤ColumnID¤Column
5¤3¤9¤7¤6¤CVR_NUMMERGenerating Data Usage Documentations via .CSV Load
If you wish to upload your Data Usage Documentations it must contain the following:
Mandatory columns
Load a csv file that has the following required columns:
Categoryhierarchy
TemplateName
UnitName
Owner
DocumentationName
Additionally, the csv file may have the following optional columns for loading text fields:
CustomField_text_name
CustomField_text_value
Additionally, the csv file may have the following optional columns for loading links:
CustomField_link_name
CustomField_link_href
CustomField_link_description
Additionally, the csv file may have the following optional columns for loading input data:
Inputdata_CustomField_name
Inputdata_category
Inputdata_database
Inputdata_Schema
Inputdata_table
Inputdata_Column (* can be used hvis ALL columns shall be imported)
Additionally, the csv file may have the following optional columns for loading output data:
Outputdata_CustomField_name
Outputdata_Category
Outputdata_database
Outputdata_Schema
Outputdata_Table
Outputdata_column (* can be used hvis ALL columns shall be imported)
IMPORTANT
The category hierarchy must exist before it is possible to load it
The category elements must be specified with ¤ as separator, for example: Level1¤Level2¤Level3
The templates specified in [TemplateName] must exist before it is possible to load them
Units specified in [UnitName] must exist before it is possible to load them
The fields specified in [CustomField_text_name] must exist as Custom fields before it is possible to load content into them
The fields specified in [CustomField_link_name] must exist as Custom fields before it is possible to load content into them
The fields specified in [Inputdata_CustomField_name] must exist as Custom fields before it is possible to load content into them
The fields specified in [Outputdata_CustomField_name] must exist as Custom fields before it is possible to load content into them
The column specified in [Inputdata_column] must exist before it can be loaded as input data
The column specified in [Outputdata_column] must exist before it is possible to load it as output data
Below is an example of what a CSV file might look like for loading both text fields and links where the character; is selected as separator:
Categoryhierarchy;TemplateName;UnitName;Owner;DocumentationName;CustomField_text_name;CustomField_text_value;CustomField_link_name;CustomField_link_href;CustomField_link_description
Level1;ETL;Heading;Ole;Dataclean;Description;This is a short description...;Link til info;https://tv2.dk/;TV2
Level1;ETL;Details;Peter;Dataclean;Important fact;Here is a description of the facts;;;
Level1¤Level2;ETL;Overview;Ole;Solvency;Description;This is a description of solvency calculations...;Link for info;https://www.dr.dk/;Danmarks Radio
Level1¤Level2;ETL;Details;Peter;Solvency;Important fact;Information goes here...;;;Below is an example of what a CSV file might look like for loading input data where the character; is selected as separator:
Categoryhierarchy;TemplateName;UnitName;Owner;DocumentationName;Inputdata_CustomField_name;Inputdata_Category;Inputdata_database;Inputdata_Schema;Inputdata_Table;Inputdata_column
Level1¤Level2;ETL;Overview;Ole;Datacleaning;Inputdata;Category 1;Database A;Schema A;Table A;column A
Level1¤Level2;ETL;Overview;Ole;Datacleaning;Inputdata;Category 1;Database A;Schema A;Table A;column B
Level1¤Level2;ETL;Overview;Ole;Datacleaning;Inputdata;Category 1;Database A;Schema A;Table A;column C
Level1¤Level2;ETL;Overview;Ole;Datacleaning;Inputdata;Category 1;Database A;Schema A;Table B;*Generating Sample Space via .CSV Load
If you wish to load a sample space into your data catalog it must have the following:
Mandatory columns
Load a csv file that has the following required columns:
SampleSpace
Value
Description
The following is an example of what a CSV file might look like where the character ¤ is selected as the separator:
SampleSpace¤Value¤Description
Postnr¤1301¤København K
Postnr¤2000¤Frederiksberg
Postnr¤2100¤København Ø
Postnr¤2200¤København N
Postnr¤2300¤København S
Postnr¤2400¤København NV
Postnr¤2450¤København SV
Postnr¤2450¤Valby
Postnr¤2600¤GlostrupGenerating Data Glossary via .CSV Load
To generate a Data glossary Automatically via .CSV Load, your file must contain the following:
Load a csv file that has the following required columns:
Path
Description
The path dictates how the glossary will structured, automatically creating folders for your glossary items. Below is an example of what the structure might look like with the character ¤ selected as the separator:
Path¤Description
folder1/folder2/item1¤This is a description for item 1
folder1/item2¤This is a description for item 2
item3¤This is a description for item 3
item4¤This is a description for item 4 And the description is simply the explanation of the respective item.
Generating Data Catalog via Server Connection String
Another way to automatically upload data to the Engrafo Data Catalog is via a connection string. To do this, first make sure you have a category created in the Data Catalog. Now just head to “Load via Connection String” under “Load Metadata” and create a new connection string. Here, select the database type, category, add a description if wanted and of course your connection string. There is also an option to schedule the import in cron format (e.g. 15 14 * * * - That being everyday at 2:15 pm). After creating, simply load the connection string and your Data Catalog will be uploaded.
SQL Connection String
Below is an example video of to generate a Data Catalog from an SQL server based on connection string: