With ONE DATA there are many ways to process data within a workflow using different processors. But sometimes, it is necessary or easier to use custom computation methods. An R-Script, for instance, is a good way to customize the processing of data because the programming language is predestined for such tasks. If you want to know why, information can be found on the offcial site of the R-Project.
Since R-Scripts are a very useful tool for data science tasks it is possible to include them in ONE DATA workflows with the R-Script processors. In this article, we will focus on the R-Script Data Generator Processor.
The R-Script Data Generator Processor can execute an R-Script that produces data. An overview on how R-Scripts are processed in ONE DATA and useful tips on how to install R packages can be found here.
The processor has no input node, it just takes the R-Script in the configuration.
Define the R-Script to execute here. The script must have a return() statement, which contains the produced data that is returned to the output node.
Timeout For Script Execution
Time (in seconds) to wait for the R Server to return the calculation results of the script. If this timeout is exceeded, the calculation will be interrupted and the connection of this Processor to the R Server will be released. The timeout starts on the Processor submitting the R script and the data to the R Server.
The default value is 300 seconds.
With this configuration option it is possible to specify what scale and representation type the columns of the output dataset have in order to provide the correct type inference in ONE DATA.
Possible scale types: nominal, interval, ordinal, ratio. Further information on scale types can be found here.
If it is not possible to convert the values of a column to the specified representation type, the processor will take the type that fits best for their representation. If types still do not fit the purpose, it is recommended to use the Data type Conversion Processor.
The output of the R-Script Processor is the dataset that was produced within the R-Script. There are two things to note on how the output has to be specified:
- The output of the R-Script needs to have the type dataframe in R. Please make sure to convert the output to type dataframe.
- The last executed statement of the R-Script needs to include the return() command in R and include the data that should be returned as dataframe.
return (as.data.frame( "insert name of output data here" ))
Example Script and Configuration
In this example we use the R-Script Data Generator processor to produce a dataset that contains two columns with four rows that have randomly chosen uniformly distributed values. As timeout we select the default value and add no Manual TI configuration.
The R-Script that was entered in the processor is given below.