When working with data, datasets often migrate from being a resource or asset to being nothing more than a tool. Data management tends to be ignored and the creation of meaningless data is no longer prevented.
Meaningless data include:
- Useless Data: incomprehensible data, hard to access data, missing values, etc.
- Volatile Data: data that needs cleaning or lack stability, etc.
- Duplicate Data: redundant or almost redundant data.
The Data Catalogue feature tends to restore the importance of metadata and bridge the gap between data quality and data quantity via what is known as Data Mapping.
Within this context, the Data Catalogue feature aims to achieve the following:
- Match & Map data with related content
- Data lineage tracking
- Data usage pattern tracking
- Data content and quality monitoring
- Enrich data with internal and external resources
- Visualize your web of data
In this series of articles, we will go through the different functionalities offered by the Data Catalogue feature and how to use them.
Data Catalogue Overview
The Data Catalogue Overview is the first view the user is directed to upon starting the App. It contains two major sections:
- A dataset-characteristics section presenting a short summary of the data in the form of four different charts, namely:
- Quality: An indicator manually affected to the dataset ranging from 1 to 3 stars.
- Data Category: A category under which the dataset is classified. What it is used for.
- Source: The source of the dataset creation (Connection, Workflow, Manual upload, etc.)
- Type: The dataset type (Oracle, Parquet, etc.)
- A table listing all existing datasets within a single instance domain, along with relevant information regarding each dataset in the form of different table columns.
Data Catalogue Summary Table
The summary table contains a list of all existing datasets across all projects registered in the Data Catalogue within the same instance domain. The relevant information presented in the different table columns group both manually affected metadata (Quality, Business Owner and Dataset Description, all affected using the Dataset Information tab), as well as interpreted metadata (Data Category or project, Source, Type, Last Modified, and Row Count).
In addition to that, the table offers a number of functionalities.
- Search for datasets using the dataset information shown below along with tags, column names and metadata.
Use "%" as wildcard for an arbitrary number of symbols, e.g. "ONE DATA" or "ONE__DATA" can be found with "ONE%DATA".
- Each column presents a sorting and filtering button (except for Link and Source). Clicking on this button opens the following Sort & Filter window. The feature offers the possibility to sort the values in ascending or descending order according to the corresponding data type. It also allows data filtering through single values or range values. The operation is executed upon clicking on the "APPLY" button.
When starting the App, a default sorting by ascending order is applied to the column "DATASET NAME".
- Upon applying the sorting or filtering changes, the corresponding icon will appear next to the column name. The indicated icon corresponds to a sorting in ascending order. A both sorted and filtered column will have the following icons
- Clicking the indicated button will open a detailed view to the selected dataset.
- The table's pagination. Contains the total number of rows, number of rows per page, and the number of pages.
In the next article, we will give a detailed overview of single dataset metadata and relevant information.