The Group By Aggregation Processor groups the dataset by one variable and aggregates the remaining columns using a specified aggregation function.
The Processor can operate on any kind of input dataset.
Warning: Please note that some aggregation functions like AVG or SUM require a numeric input in order to provide a meaningful result. These functions would still execute on string values, but the result is of no use.
The dataset is grouped by the selected column(s). The entries in the remaining columns are pooled accordingly, applying aggregation operations. Note that aggregating all remaining columns might not always make perfect sense. In that case you can select the relevant columns using the Column Selection Processor as shown in the example below.
Note That if no group by column is selected, the aggregation will be performed on all columns and all entries (the output result has one line)
In the following example we're using an accommodation input dataset. The goal is to output the average accommodation price in each city. Therefore we use the column selection processor to select the location and price of each entry.
Please note that only one aggregation function is allowed