Generates multiple binarized columns from one nominal scaled/text column for a limited number of unique values. Each binarized column will contain a 1 in a row when the corresponding nominal value is present in it. Otherwise there will be a 0.
The processor operates on a dataset containing at least one column of type string. Only such columns can be selected for binarization.
The processor has two output ports:
- Lexical Binarization Output: Contains the input dataset along with the new binarized columns, a column for each different "word" (the use of the term word here depends on the separation pattern) of the selected attribute.
- Distinct Summary Output: The amount of each "word" in the selected attribute (sum of the corresponding column) along with the corresponding fraction.
A bar chart of the percentages is also provided within the processor under the result tab.
Lexical Binarization Output
Distinct Summary Output
The previous result is provided by the processor and can be viewed under the "Result" tab.