Purpose

The Processing Library serves as a single point of entry for reusable functions and workflows. This follows the software development concept of modularity, separation of concert, test ability as well as re-usability. A data scientist can make use of the following operations:

  • Creation of reusable functions, workflows or models
  • Creation of complex building blocks on the basis of workflows
  • Sharing of functions in a catalog
  • Versioning of the elements
  • Test functions with training data

Thinking again in the "from garage to production" principle, a data scientist will start working on a project in a development environment. Once the project reached a mature state, the implemented procedures will be transformed into reusable functions wherever possible. Then migration functionalities support the data scientist to move the elements into the production environment to set them live.


Within the Processing Library, the data scientist is able to interact with the following resources:

  • Credentials store the sensitive data needed to establish a connection
  • Connections actually open and maintain the link to a data source that is needed for a standard operation
  • Data Tables actually hold data, that is needed for standardized operations such as testing
  • Models that are existing in the system can be used in standard operations
  • Workflows specifically define the operations
  • Reports to generate light-weight visualizations needed for the standard operations

A specific reusable element within the Processing Library can consist of:

  • A Python script prepared to be operated within the ONE DATA ecosystem
  • A R script prepared to be operated within the ONE DATA ecosystem
  • A workflow forming an complex building block

Standard projects

  • Creation of a generic toolbox