Overview
This processor can be used to compute recommendations for items from a list of known user transactions. It generates association rules with a single item as right hand side using frequent item sets generated by FRGrowth Algorithm.
To produce recommendations, the generated rules can be applied to a list of transactions using the Association Rule Application Processor.
Motivation
Association rule is an important pillar in Data Mining and Machine Learning, and it is commonly used in market basket analysis.
Generating association rules out of Data can help both the company well distribute its items and the user gain shopping time and make better shopping choices.
Input
Since the processor aims to generate Association Rules, it is used to study some relation between customers and items: As a result the processor needs a Dataset that at least contains two columns (list of users and item IDs).
As input, a list containing single-item transactions (one item to one user in each row) is needed. In this example, we use a Custom Input Table Processor to generate a small dataset by hand. The column Cust holds the IDs of the users and the column Product the IDs of the transactions items.
Configuration
This processor has two required fields (customer and item columns) and two optional fields:
- Min support: Give the minimal support for the item sets to count as frequent.
- Min confidence: Give the minimal confidence for the association rules to be part of the output.
WARNING: The item names cannot contain a ',' as this character is used to separate the different items in the output. There is a check in place to cache item names with a ','
To generate rules, which recommend users for items, just select the item column into the customers parameter of the processor and the user column into the item parameter.
Sensible values for the minimal support and confidence vary highly depending on the properties of the input data. If there are (almost) no item sets in the output, then lower the minimal support. If nearly all item sets are frequent, then increase it. The same goes for the rules once there is a sensible amount of frequent item sets. As the rules are derived from the item sets there cannot be rules as long as there are no frequent item sets.
Output
The main output of the processor is the list of association rules which indicates the items to be recommended on their right hand side (RHS) if all the items listed on the left hand side (LHS) are present. The items on the left hand side are separated by a ",". The right hand side only holds a single item per rule. Additionally, for each rule the confidence (regarding the input) is given.
Output example:
As an additional output the processor also provides the frequent item sets which lead to the association rules with their frequency in the input. The items are again separated by a "," token.
Additional Information
To view the results, add a Result Table Processor.
The processor also outputs rules with confidence 1, that is, rules which - in the training data - always hold and thus do not lead to recommendations of new items. This is due to the additional option to apply the rules to new data and users who are not in the training set for whom the rules might produce valid recommendations.
Algorithmic Background
The algorithm collects all items per user into an item set, finds all those subsets which are frequent within the original item sets and deduces recommendation rules by in turn making one of the items the right hand side. It then computes confidence by comparing the cases where the rule holds with all cases where the left hand side is present.
Internally, the processor uses the FPGrowth algorithm of the spark.mllib.fpm package.
The processor will handle multiple transactions of the same item by the same user by ignoring the multitude.
Example
The following example shows how to use the Association Rule Generation Processor in combination with other processors to generate recommendations for customers depending on the items they already bought.
As a starting point we use a Custom Input Table Processor to generate a small dataset by hand.
The data set is used by the Association Rule Generation Processor to generate rules and item sets. We want to have a look at both outputs to check if we selected a good configuration for the processor.
In order to apply the association rules we need a Association Rule Application Processor. However, the rules need to have an ID to refer to in the output. Thus, we use Indexing to give each rule a unique ID. Now we can apply rules to the original dataset via the Association Rule Application Processor.
As a final result we receive the recommendations together with the ID of the applied rule and its confidence in the original dataset (which generated the rule).