General

TermIconDescription
SYSTEM
ONE DATA
NavigationNavigation menu within ONE DATA to navigate to Resources, ONE DATA configurations and to the Application Health.
Action Bar
Vertical bar within ONE DATA to manage Resources or to configure/execute Workflows and Reports.
Tags
Keywords which can be added to Resources.
Resources
Data Tables, Workflows, Reports, Keyrings and Models

Application Health

Overview over current status of ONE DATA instance: Versions, Start Time, Remaining/Used space, Spark Status, Environment Information
Apache Spark
The computation logic of ONE DATA is mostly based on this analytics engine of Apache.
Apache Hadoop
We use the HDFS (Hadoop Distributed File System) to store the data when ONE DATA runs on a cluster.
Docker
Currently it is only used in combination with the Python Integration.


User/Rights Management

TermIconDescription

Domain

Domain is a closed area that can host multiple projects. It is the highest level within ONE DATA‘s rights management.

Project

Data Science tasks will be structured in Projects that contain multiple ONE DATA Resources. Projects are restricted via User and Rights Management.

The different access rights are only provided via Groups and it is not possible to provide special access rights to a single user without a Group.

Role


  • Admin: Full Access, including user administration inside Domain
  • Normal User: Read/Write/Execute Access
  • Viewer:
    • Report Viewer: Viewing Reports (without execution)
    • Extended Report Viewer: Viewing Reports (without execution) and viewing Jobs
    • Report Runner: Viewing and executing Reports
    • Extended Report Runner: Viewing/executing Reports and viewing Jobs

Group

Summarizes users with same access rights. Each Group is assigned a Role inside a Project. A user can be a member of multiple Groups.
Subgroup
Can be used for visualization purposes of groups, but Subgroups have always the same access rights as their parental Group.
Access Relation
Role of a Group inside a Project.
Sharing of Resources
All kind of Resources can be shared between different Projects.
Analysis Authorization

It is possible to define Groups/users with special access rights to the data.

Viewer License
A viewer license user is a user who can only access reports.
Viewer Group
A viewer group summarizes viewer license users with the same roles.


Resources

TermIconDescription

Data Tables

A table, which is permanently stored inside ONE DATA. These tables can be used and changed within/via Workflows. It is also possible to upload a data table using a .csv file.

Workflow

Workflows contain the algorithmic computation of data. They consist of the combination of different Processors using a certain execution logic to determine a specific result. Workflows can be executed either in Spark, R or Python context.

Report

The Report of a Workflow is a configurable user interface to provide configurations to the Workflow and to visualize the Workflow results. It is also possible to run the Workflow from the Report view.

Model



outdated:

Machine Learning Model, which was previously trained and stored afterwards within ONE DATA.

Keyring

Set of authentication keys in ONE DATA.


Data Tables/Data Storage

TermIconDescription
ETL Source
A special kind of read-only Data Table. It refers to an actual database table and always contains the same entries as the table in the database.
Parquet
Apache Parquet is a column-oriented data store. It is possible to choose between parquet and csv when writing to a Data Table.
Representation Type
The data-type of the table columns. It can be string, int, double, numeric or datetime.
Scale Type
It specifies the scaling of the table columns. It can be nominal, interval, ordinal or ratio.


Workflows

TermIconDescription

Workflow Variable


Variable defined within a Workflow in order to specify ranges of values which can applied in configurations of Processors.
Variable Manager

Overview over all Workflows Variables within a single Workflow. Here, we can also add and remove variables.

Node/Processor


A Processor is a logical unit and represents a functionality that is applied when entering data.

A list of all Processors and their documentation can be found here: Processor Documentation

Group 

Possibility to structure a Workflow into different logical parts.
Import/Export

Is is possible to import/export certain nodes or entire workflows.

Schedule

Project-specific resource for scheduling automatic Workflow executions. Inherits data structure from generic resource.

Holds exactly one Schedule Scheme.

Schedule Scheme


Collection of timing specifications that form the future execution plan for a Schedule.
Can contain one to many elements.
The scheme elements - namely Schedule Rules - trigger Schedule execution additive, i.e. when two elements define a future execution at overlapping times, both executions will be triggered.

Schedule Rule


One single planned execution rule saved for exactly one Schedule. Can be on a CRON basis. These elements define the points in time, where the Schedule triggers Workflow execution.

Job

One execution of a Workflow. Via Show Jobs it is possible to list all recent jobs to see execution times, the job status (successful/not successful) and to switch between Workflow versions (main versioning mechanism).
R Integration
It is possible to run R scripts within Workflows.

Python Integration


It is possible to run Python scripts within Workflows. Therefore using Python context.
SQL
There are several Processors which allow the user to enter (Spark) SQL queries to apply these queries directly to the provided tables, e.g. Query Processor, Double Query Processor, Flexibel ETL Load Processor, ...

Microservice


For external applications it is possible to communicate via Microservices with ONE DATA. There are Microservices Input and Microservices Output Processors. These Processors have to be actively enabled in order to be usable. 
API Load

It is possible to load data (within Workflows) from REST APIs that can be targeted with GET requests.

GeoJSON
Processor which awaits GeoJSON data as input. After the Workflow execution the Processor result can be visualized within a Report using a Visualization Container and selecting a Geo Chart.
Production Line


Planned Feature: To link Workflows and execute them after another.

Node/Processor

TermIconDescription

Processor Input


Table, which is provided via an edge to the Processor as input (above the Processor). It is possible to have multiple inputs.
Processor Output
Table, which is provided via an edge by the Processor as output (below the Processor). It is possible to have multiple outputs.

Config Element


It is possible to open a Processor (double click) to configure it. All possible configuration elements

Composed Config Element


A Composed Config Element is a special kind of Config Element inside a Processor. This kind can be defined once for each column of the Processor Input. It is used to specify the selected column using multiple configurations.


Reports

TermIconDescription

Container


Reports can be created by arranging different kind of Containers. There are currently three Container types:

  1. Visualization Container: To visualize all kind of results, e.g. displaying the result table or visualizing a result table within a Heat-Map.
  2. Model Container: Insert a ONE DATA model into the report.
  3. Application element Container: This category has multiple container types, basically configuring containers e.g. configuration, variable, filter, etc.
Report Grid
It is possible to define the dimensions of a Report via a grid (number of rows and columns can be defined). When Containers are positioned or resized inside a Report they automatically adjust to the grid.