Introduction

It was requested to provide the functionality to only retrieve selected columns from a Data Table. The "/data/content" endpoint allows passing an array of transformations to the request which then returns the respective result.

The following transformations are available:

  • SqlQuery
  • ColumnsSelection
  • Aggregation(count,sum,avg,min,max,group)
  • ColumnsSorting
  • Filter (Boolean, Integer, DateTime, Double, String)
  • Join

The data_transformers parameter is available in the following API methods:

Usage

In general the data_transformers parameter awaits a list of dictionaries. Its elements represent the respective transformations.

### Usage with written-out dictionary
dt: DataTableContent = onedata_api.datatables.get_content(data_id=datatable_id,
data_transformers=[
{
"type": "ColumnsSelection",
"columns": ["Zip"]
},
{
"type": "ColumnsSorting",
"sortConfigs": [
{
"attributeName": "Zip",
"sortOrder": "DESC"
}
]
}
])

Helper Functions

To avoid the cumbersome process of writing these configurations the SDK now provides functions that return the respective dictionary.

Currently, there is no helper function for the join transformation until requested.

### Usage with provided functions
from onedata.datatables.transformers import select, sort, desc

dt: DataTableContent = onedata_api.datatables.get_content(data_id=datatable_id,
data_transformers=[select("Zip"), sort(desc("Zip"))]
)

To be able to use those functions they first have to be imported from onedata.datatables.transformers.

This file offers functions for a guided creation of transformations that can be applied to data tables retrieved by e.g. OneDataDataTableApi.get_content. It allows the user to generate the transformations' parameter assisted by the SDK and makes it more convenient to use. The transformations' parameter in the respective API method expects a dict and therefore all helper methods return formatted dicts for each available transformation.

  • select(*columns: str)
  • sql(sql: str)
  • filter(column_name: str, column_type: Union[RepresentationType, str], search: Union[str, None] = None, values: Union[list, None] = None, range_min=None, range_max=None)
    Note 1: column_type either accepts an enum of type RepresentationType or a string value of "bool", "str", "int", "double" or "datetime"
    Note 2: If the filter values are set to None or the array is empty, all results will be returned.
  • sort(*config: dict) (argument position = sort priority)
    • asc(column_name: str)
    • desc(column_name: str)
  • aggregation(*aggr_config: dict)
    • sum(column_name: str)
    • count(column_name: str)
    • avg(column_name: str)
    • min(column_name: str)
    • max(column_name: str)
    • group(column_name: str)

Usage Examples

### sql
dt: DataTableContent = onedata_api.datatables.get_content(data_id=datatable_id,
data_transformers=[sql("SELECT Zip FROM inputTable")])
print(f"dt sql: {dt.records}")

### select
dt: DataTableContent = onedata_api.datatables.get_content(data_id=datatable_id,
data_transformers=[select("Zip", "PDT")])
print(f"dt select 'Zip' and 'PDT': {dt.records}")

### aggregation
dt: DataTableContent = onedata_api.datatables.get_content(data_id=datatable_id,
data_transformers=[aggregation(sum("Zip"))])
print(f"dt SUM 'Zip': {dt.records}")

dt: DataTableContent = onedata_api.datatables.get_content(data_id=datatable_id,
data_transformers=[aggregation(avg("Zip"))])
print(f"dt AVG 'Zip': {dt.records}")

### sort
dt: DataTableContent = onedata_api.datatables.get_content(data_id=datatable_id,
data_transformers=[sort(asc("Zip"))])
print(f"dt sort ASC 'Zip': {dt.records}")

### filter
dt: DataTableContent = onedata_api.datatables.get_content(data_id=datatable_id,
data_transformers=[
filter("Zip", "int", None, [94041, 94063])])
print(f"dt filter 'Zip': {dt.records}")