TABLE OF CONTENTS

  1. Overview
  2. Input
  3. Configuration
  4. Output
  5. Example
    1. Workflow
    2. Example Input
    3. Example Configuration
    4. Result - Everything Parsable
    5. Results - Invalid Rows

Overview

This Processor takes a defined column and parses its single cell value into a CSV data table output. The column of interest can only contain one cell, i.e. one row

The Flexible REST API Processor can be used in concatenation with this Processor in order to parse the data provided by the former.


This is an advanced Processor. The user should know how to work with CSV content and preferably should be familiar with parsing content in general.


Input

The input data can have multiple columns, but only one will be parsed. The data can not have more than one row. In the case of multiple rows, the CSV-Parsing Processor will throw an error. The data to be parsed is therefore in one single cell.


A line break in the cell will result in a new row for the parsed data. The Processor determines the schema (column names) of the output by parsing the first values until it encounters a line break. After parsing the column names, the Processor parses the cell values by reading the values until a line break is encountered. If the number of values in a row does not match the number of columns, the row is considered invalid. In the configuration, you can decide how the Processor handles such rows.

An example input can be found below.


Configuration


String Escape Token: Used to indicate where a columns value starts and where it ends. For example, if the token is an apostrophe ("), then the column value can be defined between two apostrophes: "column content". All the content between the apostrophes will be parsed as a single value, even if it contains the delimiter token. It's not necessary to use the String Escape Token, if the value does not contain special characters. An example can be found below.

Escape Token: Used to escape the String Escape Token, in case it is needed inside a cell value also containing the delimiter token. An example can be found below.

Parse Mode on Invalid Rows: Determines the way invalid rows (e.g. too few/many cells, unparsable numbers or timestamps) will be treated during parsing. You can choose between 3 options.

  • Create warning: A warning will be created in case of invalid rows. The invalid rows will be put into the right side output, the regular (left side) output only contains valid rows. Be aware that this option may impact performance.
  • Drop invalid rows without notification: Invalid rows will be dropped, the right side output will only contain rows that were invalid due to an unparsable timestamp value.
  • Create error: Will fail the loading process on the first invalid line encountered.


Output

The Processor has two outputs: one for successfully parsed rows (left output) and the one for rows that could not be parsed (right output). The content of the unparsed output on the right depends on the option chosen for the Parse Mode on Invalid Rows configuration.


Example

Workflow


Example Input

In this scenario the CSV-Parsing Processor is linked to a Custom Input Table that has two columns. Both of them are in CSV format. Note that Col2 has a value missing. "project3" seems to have 30 employees but there is no value for budget. This column will be used to show the different results for the Parse Mode configuration.


Col1Col2
project,number_of_employees,budget,employees
"project \"One, Two, Three\"",2,10000,"K. Bauer, A. Maier"
"project \"A, B, C\"",3,100000,"K. Bauer, A. Maier, S. Palm"
project,number_of_employees,budget
project3,30
project4,25,100000


Example Configuration


Result - Everything Parsable

The left Result Table shows all successfully parsed entries.



The right Result Table would show data that is invalid and could therefore not be parsed, but as Col1 does not have invalid data it is empty.


Results - Invalid Rows

The input table also has a second column called Col2, which has a value missing. For the following examples, Col2 is set as the CSV column in the Processor configuration.


If option Drop invalid rows without notification is chosen for parser mode, then the Workflow will be green to show a successful execution. The left Result Table will show the only valid row.


The right Result Table is empty in this case, as there were no timestamp values that could be unparsable.


If the option Create warning is chosen, the Processor will show a warning.

The left output will be the same as above. The right output contains all invalid rows.



If the option Create error is chosen, the Workflow's execution will be stopped and the Processor will show an error.