TABLE OF CONTENTS

Summary

This feature introduces a specific Home Project for Data Tables. The Home Project is one of the projects the Data Table is part of ("shared to"). This allows the user to know from which project a shared Data Table originates from. Currently only Data Tables have Home Projects.

Home Projects fix a former problem concerning sharing of Virtual Data Tables (VDT) and their Spark Execution Context (SEC). The Home Project now defines which SEC to use. More on this here.


All Data Tables existing prior to this feature were assigned a Home Project based on the following migration priority: DatahubModelhub, Processing Library, Use Casesthen alphabetically on project name. In the case of multiple remaining projects having the same name, one was picked at random.


Where to Find the Home Project

The Home Project of a Data Table can be seen at the following places: Data Tables list overview and Data Table Information view. 



You can see the name of the Home Project regardless of whether you have access to the project or not. If you have access, the name also links to the actual project.



Home Project for New Data Tables

A newly created or shared Data Table will automatically get assigned a Home Project. Which project gets chosen, depends on the method with which the Data Tables was added.


Method of Adding Data Table
Assigned Home Project
Uploading a Data TableCurrent project (Project the Data Table was uploaded to)
Creating Data Table by WorkflowCurrent project (Project of the Workflow)
Importing project with Data TablesCurrent project (The newly created project) 
Fetching from project / Sharing to another projectSource project (project the Data Table was fetched from)

Changing the Home Project

Single Data Table Edit

The Home Project of a Data Table can be changed in the Data Table Information via the 'Edit Data Table Meta Information' dialog. It's accessible through the following two buttons.


The Home Project can be found at the bottom of the dialog together with the affiliated projects. To switch the Home Project to a different project, the user needs Edit rights in that project.



Bulk Data Table Edit

When wanting to set a new Home Project for multiple Data Tables, a different common project would first have to be found. To not have to filter for common projects every time, and to not face the possibility of not finding one, it is only possible to set the current project as Home Project for multiple Data Tables. The button for this action lies in the side bar of the Data Tables list. A confirmation dialog appears before the change takes effect. To switch the Home Project to a different project, the user needs Edit rights in that project. 



Automatic Home Project Update in other Scenarios

A Data Table can be removed from a project in different ways:

  • Removing the Data Table from the project directly
  • Moving the Data Table to another project
  • Deleting the project

When a Data Table is removed from its Home Project a new Home Project is automatically selected from the other projects the Data Table is shared to. This happens based on the migration priority:

  1. Datahub
  2. Modelhub
  3. Processing Library
  4. Use Cases
  5. Alphabetically on project name

In the case of multiple remaining projects having the same name, one is picked at random.


Example PreconditionActionExample Result
  • DT is shared to: A, B
  • Home Project: B
Move DT from B to C
  • DT is shared to: A, C
  • Home Project: A (e.g. alphabetical order)
  • DT is shared to: A, B, C
  • Home Project: B
Remove DT from B
or
Delete project B
  • DT is shared to: A, C
  • Home Project: A (e.g. alphabetical order)

Home Project and Spark Execution Context

For generating the preview of the Data Table in the Data Table Information view, the Spark Execution Context from the Home Project is generally used.

Special Case Virtual Data Tables

When Virtual Data Tables are created, they get their current project assigned as Home Project. However, when determining the correct SEC for preview generation, the Home Project gets ignored when dealing with VDTs.


Virtual Data Tables are different to normal Data Tables, as they are generated at runtime. The source Data Table is necessary during execution of the VDT, which is why VDTs use the SEC of the source Data Table. Previously, this caused problems with the SEC when the VDT and the source Data Table were in different projects. Additionally, the original Data Table could be shared into multiple projects with possibly multiple SECs. The Home Project now allows the system to determine the correct SEC, which is the SEC of the Home Project from the source Data Table.