Overview

Generate new columns based on the respective values in an existing column. 


Input

This processor works on any kind of input dataset.


Configuration

The names of the columns created by the second field in the configuration will be according to this schema: columizedColumnName_columizedColumnValue_duplicatedColumnName where duplicatedColumnName are the columns which are aggregated. 


Warning: Note that in order for the workflow to be executed, at least one of the three aggregation methods (in the blue boxes) should be configured.


Advanced Configuration

It is possible to select multiple columns for aggregation, in that case it is recommended to toggle the last configuration field (Use Broadcast Joins). Enabling this toggle allows to perform the columization on each of the selected columns and then join the resulting tables.

It is important that the resulting table does not exceed the memory limit of the workers.


Output

The result table contains the columns selected in the first configuration field, along with the created columns with values from the feature selected in the second configuration field.


Example

In the following example, we would like to output the the cheapest accommodation in different locations for each accommodation type.


Example input


Workflow

Example Configuration

Result

Minimum values are selected for each location and missing values are replaced by the chosen default value.

If a second aggregation method is to be chosen, new columns are created.


Here the average price is selected as a second aggregation method. The used default value is 250.


Related Articles

Lexical Columization Processor