TABLE OF CONTENTS

General Description

A Virtual Data Table (VDT) is a type of Data Table which is built on top of an existing Data Table. As the name indicates it is a virtual Data Table which implies that it is not stored somewhere but is generated at run time. This is comparable with SQL database views.

The addition of VDTs makes your Data Tables, Apps and Workflows easier to maintain. As an example, when needing SQL transformed data regularly, instead of having to do the same transformation every time using a Workflow or an App, you only need to it once within your VDT. The VDT can then be used in all your Workflows or Apps. Also, VDTs always reflect changes in the original data. It is therefore not necessary to create a new VDT when data gets updated.




A future benefit is based on the possibility to share VDTs between projects. If users don't have access to the underlying datasource (because the Data Table is not shared, and normal users don't have access to its project), it is possible to use VDTs to strip away sensitive data from an underlying datasource and to share this VDT to projects where users should get access to certain parts of the data.

If a user wants to modify the VDT configuration within a shared project, the user needs write privileges for Data Tables and sufficient rights on the underlying datasource to do so.



Where to Find the Functionality

Virtual Data Tables can be found in the Data Tables tab, alongside all other Data Table types. They are indicated by the new type "Virtual".


Creating & Editing A Virtual Data Table

To create a Virtual Data Table, a source Data Table is necessary. In the detail view of the existing Data Table under "Specific Actions", a button "Add New Virtual Data Table" can now be found. An overview of which Data Table types are supported or not can be found under Supported Data Table Types And Their Limitations. If a Data Table type is not supported, the button will not be visible.

Creating a Virtual Data Table

After clicking on the "Add New Virtual Data Table" button in the Data Table detail view, the following dialog opens.

  • The VDT name needs to be unique and cannot be left empty.
  • The source Data Table cannot be changed here. If you want to create a VDT from a different source, then you need to open the dialog from that Data Table.
  • The other fields can be left empty if not necessary. 


After its creation, the VDT will be listed in the Data Tables view with type "Virtual".

Editing the Virtual Data Table Definition

Virtual Data Tables themselves cannot be modified, but their definition can. This definition is written in JSON format, similar to a datasource configuration in Apps.

To edit the definition, open the detail view of the created VDT. In the left panel click on "Edit Virtual Data Definition".


The following dialog will open, containing an editor and a preview.


The editor section has two tabs: SQL and Definition. The SQL tab is always in sync with the "sql" option in the definition, so it is not required to refresh the preview or update the definition when switching between the two editors. There are three scenarios, in which it is not possible to switch from Definition to SQL.

  • If the definition JSON contains errors. This also disables saving. 
  • If the definition contains transformations inside dataOptions. (Transformations overrule SQL by definition. To avoid confusion, SQL tab is disabled.)
  • If the definition contains comments. This also disables saving because it is technically an error.


SQL Tab

The SQL tab contains a SQL editor. Here user can write complex queries while utilizing features like suggestion, autocomplete and error highlighting. This editor is the same as the already known Apps SQL Editor.


Additionally to the editor field, the tab has two button. 

Shortcuts: Opens a shortcut dialog which shows all the shortcuts that are supported by the editor.

Refresh Schema: Fetches the underlying schema i f it has been changed, so all tables/columns can be properly used in the auto-completion.

Definition Editor

The editor shows the entire configuration of the Virtual Data Table. The only thing that should be changed lies under dataOptions. This can be a "sql" option for example, but others can also be defined. The editor is equipped with hint assistance (ctrl + space).


Data Preview

A preview of the datasource config result is shown under Data Preview. When the SQL or the definition contains changes, the preview can be updated using the "Refresh Preview" button.


Example

In this example, a query was written to only return the columns "name" and "country". After writing the query in the SQL tab, dataOptions inside the definition was updated. As expected, the refreshed preview only shows two columns.


Using Virtual Data Tables in Workflows

Virtual Data Tables can be loaded into Workflows via the Data Table Load Processor and then forwarded to any Processor that fulfills your use-case. From the VDT detail view, just like for any other Data Table, a Workflow using the VDT can be created through the "Add Workflow to Project" button. 


You cannot write to a VDT via the Data Table Save Processor!


Boundaries & Current Constraints

There are three things you need to be aware of when using Virtual Data Tables.


Ignoring Unknown JSON Attributes

The virtual datasource configuration editor accepts and ignores unknown JSON attributes silently. Thereby, if ONE DATA Apps introduces new datasource attributes and a datasource configuration from an App is copied into a virtual datasource configuration, it might function despite the unknown attributes. Of course, the editor will still warn the user about the unknown attributes.


Supported Data Table Types And Their Limitations

Current support and limitations based on the type of the underlying "source" data table:


Data Table TypeSupportedSQL Dialect (also holds for any respective data source in Apps)Known Limitations (also holds for any respective data source in Apps)
legacy Parquet (from Dataset Save Processor)no

legacy CSV (from Dataset Save Processor)no

internal PostgreSQLyes

Spark or native PostgreSQL depending on the server-config 
(see 1. under the table)


regular ParquetyesSpark
regular CSVyesSpark
PostgreSQL from Connectionyes

Oracle from Connectionpartlynative Oracle

Custom SQL on top of custom SQL does not work
(see 2. under the table)

Hive from Connectionpartlynative Hive

Custom SQL on top of custom SQL does not work
(see 3. under the table)

Impala from Connectionyesnative Impala
DB2 from ConnectionyesSpark
HANA from Connection?native HANACurrently unable to test.
Microsoft SQL Server (also with Kerberos)yesSpark
MySQLyesSpark
Any dataTable from a File System Connectionno


  1. ONE DATA backend config to enable native SQL:
    frt.postgres.sqlForDataRequests.enable=true (this is the default)
    frt.postgres.sqlForDataRequests.usePostgresCustomSql=true (false by default)

  2. If the VDT config contains a "sql" dataOption, it is not possible to query from the VDT via a SQL editor App anymore. Furthermore, it is not possible to have a "sql" dataOption in both the VDT config and in the datasource config in Apps when loading this VDT. What holds for the "sql" dataOption also holds for SQL transformations in the "transformations" list. An error message from the connected Oracle database will appear, which could look like the following:
    SQL [select * from (select * from (with intermed_res_df62cd36_4e02_4aa as (select * from (select * 
    from (with intermed_res_773da119_ec1d_470 as (select * from (select * from (select * from 
    "ONEDATA"."MIRI_TEST" where 1 = 1) "alias_10457558") "alias_49655345") select * from (SELECT * 
    FROM intermed_res_773da119_ec1d_470 ) "csql_368b22fe_752a_4ece_9f10_d") "alias_35620904") 
    "alias_80852921") select * from (SELECT COUNT(*) FROM intermed_res_df62cd36_4e02_4aa ) 
    "csql_b1c5fabd_5506_44ae_a418_1") "alias_59666985" fetch next ? rows only) "data_query"]; ORA-
    32034: unsupported use of WITH clause

  3. If the VDT config contains a "sql" dataOption, it is not possible to query from the VDT via a SQL editor App anymore. Furthermore, it is not possible to have a "sql" dataOption in both the VDT config and in the datasource config in Apps when loading this VDT. What holds for the "sql" dataOption also holds for SQL transformations in the "transformations" list. An error message from the connected Hive system will appear, which could look like the following: 

    SQL [select * from (select * from (with intermed_res_a16657c6_7f00_4eb as (select 
    `alias_13037245`.`column_1` from (select `alias_48928118`.`column_1` from (select 
    cast(count(`column_1`) as int) as `column_1` from (select * from (with 
    intermed_res_29a32ebd_d080_459 as (select * from (select * from (select * from 
    `default`.`2peters_table` where 1 = 1) as `alias_111749179`) as `alias_32903205`) select * from 
    (SELECT * from intermed_res_29a32ebd_d080_459 ) as `csql_1e89ef2c_4eed_41bb_be2f_2`) as 
    `alias_41482550`) as `alias_13355627`) as `alias_48928118`) as `alias_13037245`) select * from 
    (SELECT count(*) from intermed_res_a16657c6_7f00_4eb ) as 
    `csql_a3af7978_a309_4fa5_b770_e`) as `alias_103801420` limit 200) as `data_query`]; Error while 
    compiling statement: FAILED: SemanticException [Error 10008]: Line 1:230 Ambiguous table 
    alias 'intermed_res_29a32ebd_d080_459'


Installation Instructions

These instructions are only relevant for DevOps tending to ONE DATA instances. 


Virtual Data Tables is a Microservice that needs specific configuration to work with ONE DATA. There are no dependencies to Apps or Portal, but when integrating Virtual Data Tables you need to be careful which ONE DATA version to use. VDTs can be used starting with the following ONE DATA release:


    ONE DATA Release 3.30.0
    Version Server 46.48.0
    Version Client 1.190.0
    Released: 23.12.2020 


The following points need to be set in the docker-compose file. The security code for the JWT token being the same as used for ONE DATA, Portal and Apps.

environment:
      - "ONEDATA_API_PATH=/api/v1"
      - "VIRTUAL_DATATABLE_CLIENT_BASE_PATH=datasource-editor"
      - "VIRTUAL_DATATABLE_CLIENT_PATH=datasource-editor"
      - "security.jwt.secret=${JWT_SECRET}"

If this is configured, you should now be able to use the Virtual Data Table feature inside of ONE DATA. The fastest way to check this, is by the following URL: https://{{od-instance}}/datasource-editor/ (e.g. team.onedata.de/datasource-editor/). If something is still missing, you will get a 404 Page Not Found error. Otherwise, you should see an Edit Virtual Data Table Definition dialog.