Overview

Generates multiple binarized columns from one nominal scaled/text column for a limited number of unique values. Each binarized column will contain a 1 in a row when the corresponding nominal value is present in it. Otherwise there will be a 0.


Input

The processor operates on a dataset containing at least one column of type string. Only such columns can be selected for binarization.


Configuration

Output

The processor has two output ports:

  • Lexical Binarization Output: Contains the input dataset along with the new binarized columns, a column for each different "word" (the use of the term word here depends on the separation pattern) of the selected attribute.
  • Distinct Summary Output: The amount of each "word" in the selected attribute (sum of the corresponding column) along with the corresponding fraction.

A bar chart of the percentages is also provided within the processor under the result tab.


Example

Example Input

Example Configuration

Workflow

Result

  • Lexical Binarization Output

  • Distinct Summary Output


The previous result is provided by the processor and can be viewed under the "Result" tab. 


Related Articles

Lexical Columization Processor