The Data Type Recommendation Processor generates a JSON result containing type recommendations for each column of a selected dataset.
The processor does not accept direct input. Selecting the dataset for which the types should be determined for is done in the configuration interface under "Data Set". This field is mandatory.
Allow Null Value: By default it's turned off and columns containing null values will not get any recommendations. If activated, null values do not influence the result.
Top-k Sampling: By default, Top-k Sampling is activated and the first k rows of the dataset will be used. When disabled, k random rows will be chosen. For larger datasets Top-k Sampling is faster, as the whole dataset has to be loaded for Random Sampling.
Additional Timestamp Masks: The formats have to comply with the specification of the Java class SimpleDateFormat.
The processor does not produce any direct output. Type recommendations are shown in a JSON format under "Result" and "JSON Result" in the processor.
The following data is used to create a dataset with the Data Table Save processor in a separate workflow.
The created dataset is then selected in the configuration of the Data Type Recommendation processor. The other configuration fields are set to the default configuration, which is the one shown above.
After running the workflow, only representation type String is recommended for column "Cities", while for column "Date" both type String and DateTime are possible. Scale types are recommended suitable for the representation types.
The processor does not take the initial column types from the dataset into account, but tries to infer the types by going through column samples (defined by "Sample Size"). As a result, numerical columns can be inferred to also be of type String.