Overview

The processor performs a linear regression with one dependent and one or multiple independent variables.

The learning objective is to minimize the squared error, with regularization.

It is possible to use this processor without forecasting just by inserting a Replication Processor to enter the same input twice. 

Input

The processor requires two input datasets. The left input port corresponds to the training dataset (this data should be already labeled). The right input port corresponds to the test dataset.

The training and the test datasets must have the same schema.


Configuration

For more information about the regularization parameter, you may consult the following link.


Output


The Improved Linear Regression Processor returns two different output tables:

  • Output Forecast: Returns the input test dataset with an additional column containing the forecasted values.
  • Output Linear Regression Results: Returns the regression statistics (weight, standard error, p-values, t-values).

In addition, the "Output Linear Regression Results" table is shown in the result tab within the processor along with other forecasting metrics.


Example

In this example, the used data table contains YouTube videos relative information (total likes, dislikes, etc.). The processor of focus is used to forecast the number of comments.


Example Input


Workflow

The Horizontal Split Processor is used to generate the training and test datasets from the original input. The Column Selection Processor is used to forward only the dependent and independent variables.


Example Configuration


Result

Output Forecast


Output Linear Regression Results


Processor Result


Related Articles

Decision Tree Regression Forecast

Decision Tree Classification Forecast

Random Forest Classification Forecast Processor

Random Forest Regression Forecast Processor