Overview

The Multi-Input Query Processor can be used to execute a Spark SQL query statement on multiple input datasets, similar to the Double Input Query Processor.

Spark SQL brings native support for SQL to Spark and streamlines the process of querying data stored both in RDDs (Spark’s distributed datasets) and in external sources. 

More information about Spark SQL.


Input 

The processor works with three input datasets that contain any type of data. If you need less input ports for a query, you can either connect an empty input table to one of the nodes of the processor, or you can use the Double Input Query Processor.


Configuration

In the processor configuration, the SQL statement to execute, and the aliases of the input datasets can be defined.

Note that the aliases of the input datasets also have to be defined within the SQL statement to be able to use them.


Output

The processor output is the result of the SQL statement specified in the configuration.


Example

In this example we want to join three tables with product, customer and transaction data to see which customer bought which product.


Input


Workflow

In this workflow we just take three Custom Input Tables for our input. The result is stored in a Filterable Result Table.


Configuration

In the configuration we insert our SQL query which is executed on the input datasets. When datasets are connected to the processor, the column names of inputs are also shown to make it easier to write the query.


Result



Related Articles

Double Input Query Processor

Query Processor

Query Helper Processor