The Frequent Sequences Mining Processor finds equal sequences within input independent columns.
The input dataset should have one or more columns containing observations that can be decomposed into sequences of pairs.
The output table can be viewed under the result tab within the processor. It contains three columns:
- Frequency [%]: The frequency of the generated sequence. It's greater than or equal to the maximal supported frequency specified in the configuration.
- Frequency: The frequency of the generated sequence in numbers.
- Sequence: The generated sequence, or sequence combination, separated by a comma. The pairs of elements within the sequence are separated by '|'.
In this example, we try to generate the different sub-sequences along with their respective frequencies from three input columns having sequences of integers as entries. The input table is shown below.
The output sequences are generated sequences of number pairs of length varying from 1 to the maximal sequence length specified in the configuration. The sequences are displayed in increasing length order. First, sequences of length 1 are displayed (6 pairs with frequency more than 50%), then possible combinations of the given pairs with frequency more than 50% are displayed as sequences of length 2.
To sum up, the output sequences are tuples of pairs (the order is important), extracted from all columns combined.
The following example makes it easier to understand the case of having more than one independent attribute in the configuration. Here we set the minimal supported frequency to 1, meaning that the sequence has to be found in every single line of the input dataset, all columns' content combined. The result table is as follows:
Only one sequence of length 1 verifies the described condition.