The Join Pattern Mining Processor finds common patterns within specific columns of two input datasets.
The processor requires two input datasets.
The processor has two types of output:
- A table of the found patterns and the count of occurrences inside the processor.
- A dataset containing the values of the left input table as well as the values of the right input table that can be assigned to them based on the found patterns.
Two tables containing email information have to be analyzed.
Inspect the result within the Join Pattern Mining Processor by clicking on it.
Note that the image was made by downloading the csv (button found inside the processor) and uploading it as a new Data Table, as the view inside the processor does not allow to change the width of a column.
Two common patterns "@today.com" and "@magic.com" were found, which consist of 10 characters.
The rule "table1_Email_Address_5_10->table2_Mail_6_10" means the following:
An element was found in table1 under the column Email_Address, an element (firstname.lastname@example.org) with 5 non common letters ("klara") and 10 common letters ("@today.com"). This address matches an element in table2 under the column Mail (email@example.com) with 6 non common letters ("carlos") and 10 common letters ("@today.com").
The count listed in the table would be 2, if either table1 or table 2 had another email with five or six non common letters respectively.
Inspect the result of the output node of the Join Pattern Mining Processor by using a Result Table or Data Table Save Processor. It lists, additional to the rules and examples already shown in the Join Pattern Mining Processor, the actual value of table1.