This processor calculates the relative frequencies of the distinct values from a set of selected columns in an input Dataset.
Having statistical summaries of the Data can be helpful in the Analysis of this Data, which can then help better understand it and extract value out of it.
For example, a clothing company selling diverse items from different categories.This company targets different regions from Germany and wants to analyse the different item categories as well as the target regions.The aim is to find the best sold categories and the the most active target regions and better concentrate on them.
The Distinct Summary Processor can help achieve this goal, as it will give the frequencies of the categories and the target regions.
This processor works with any given Dataset on which the summaries will be calculated.
The configuration interface of the processor is the following:
This processor provides two kinds on interesting outputs:
- within the processor and after running the workflow, a bar chart for the frequencies of the distinct values of a column is displayed.There is also a drop-down to select the column for which the chart will be displayed (for better understanding please refer to the example below)
- The output node of the processor generates a table with four columns: The first one specifies the column name the values belong to. The second column contains the value which occurrence will be displayed in the third column and the fourth column contains the relative frequency.
To simplify the example, the Dataset of interest will be a small one for the Profit of a company with respect to the region, the month and the item category:
The used Dataset is created via a Custom Input Table, and the result can be visualized by a Filterable Result Table.
Referring to the processor output, we can observe that "Berlin" is the least occurring region (as a result it is potentially the region with the least profit). Same observation for "January" month.