Generates multiple binarized columns from one nominal scaled/text column for a limited number of unique values. Each binarized column will contain a 1 in a row when the corresponding nominal value is present in it. Otherwise there will be a 0.


The processor operates on a dataset containing at least one column of type string. Only such columns can be selected for binarization.



The processor has two output ports:

  • Lexical Binarization Output: Contains the input dataset along with the new binarized columns, a column for each different "word" (the use of the term word here depends on the separation pattern) of the selected attribute.
  • Distinct Summary Output: The amount of each "word" in the selected attribute (sum of the corresponding column) along with the corresponding fraction.

A bar chart of the percentages is also provided within the processor under the result tab.


Example Input

Example Configuration



  • Lexical Binarization Output

  • Distinct Summary Output

The previous result is provided by the processor and can be viewed under the "Result" tab. 

Related Articles

Lexical Columization Processor