The Query processor executes a Spark SQL query statement on the input datasets.
Spark SQL brings native support for SQL to Spark and streamlines the process of querying data stored both in RDDs (Spark’s distributed datasets) and in external sources.
More information about Spark SQL.
The Query processor operates on two input datasets containing any type of data.
NOTE THAT the input tables must be called firstInputTable and secondInputTable, respectively, when calling them in the SQL statement.
Supported SQL features can be found in the Spark SQL documentation.
Once the query is executed, the response can be visualised in the output table.
We use the following SQL statement to query our data set: