The purpose of the Model Hub is to interact with models that are uploaded or created at ONE DATA. Besides uploading functionalities within ONE DATA, a model can be either generated with specific processors or by the help of R- and Python processors.
Especially when speaking of production environments, one wants to have the full flexibility while being auditing proof. Within the model hub, the data scientist has capabilities to:
- maintain already generated models
- versioning of models to check if the forecasting quality has changed in the desired way
- quality check for models to meet quality management standards
- run models on sample data or production data
In most occasions, models are trained in a batch environment on smaller datasets. The training phase is most likely a complex task with lots of iterations. Once the model has reached out a certain quality level, it can be applied in an productive environment. In such environments, the key to success is speed and the execution possibilities within light-weight environments. To enable such a speed layer, ONE DATA is using a model serving technique to serve models in a light-weight environment (e.g., Python). Obviously this environment neither provides quality checks or audit proofness out of the box. By combining Model Hub with model serving capabilities via the API layer, production readiness is also brought to those light-weight environments.
Within the model hub, the data scientist can interact with the following resource types:
- Data Tables holding the training data or the data the model should be actually applied on
- Workflows define the actual processing method, pre- or post processing tasks when the model is applied
- Production Lines define a execution order of workflows and functions that can be equipped with quality gates to implement quality management principles
- Schedules are used to continuously evaluate models on new data portions
- Reports to generate first insights of the results generated
- Batch-training of model