This processor calculates certain statistical measures (exp: Mean, Median ...) and visualizes the Data in boxplot.
To gain important insights from Data and extract helpful information embedded in it, it is necessary to understand this Data which this can be achieved by applying some Descriptive Analysis on the Data of interest.
The Heuristic Summaries Processor helps generate significant statistics from the input Data.
The processor requires a Dataset containing at least one numeric/ratio scaled column/variable.
The configuration menu of the processor is the following:
Compression Size: the higher the value, the higher precision will be BUT execution time rises and memory consumption will be high.
Merge Interval: define interval for merging tree centroids
- these two configuration fields are experimental
- the processor does not infer column type meaning that if a column is declared as of type "String" but contains only numbers the processor will NOT take it into account
This processor provides two outputs:
- within the processor: Displays a Boxplot graph accompanied with multiple statistical measures (min, max, median ...) for each numeric Column from the input Dataset
- The output node of the processor generates a table with 12 columns: ColumnName, min, max, sum, median, firstQuartile, thirdQuartile, arithmeticMean, geometricMean, lowerWhisker, upperWhisker, numberOfRows
In this example the Heuristic Summary Processor will be applied on a simple Dataset to extract statistical measures from this Data:
Distinct Textual Summary