PAMI is a Python library containing 100+ algorithms to discover useful patterns in various databases across multiple computing platforms. (Active)
Recurrent pattern mining aims to discover all interesting patterns in a temporal database that have periodic support no less than the user-specified minimum periodic support (minPS) constraint, period no greater than the user-specified maximum Interval Time (maxIAT) constraint and Recurrence no less than the user-specified minimum recurrence (minRec). The minSup controls the minimum number of transactions that a pattern must appear in a database. The maxIAT controls the maximum interval time the pattern must reappear. The minRec controls the number of periodic intervals of a pattern.
Reference: R. Uday Kiran, Haichuan Shang, Masashi Toyoda, Masaru Kitsuregawa: Discovering Recurring Patterns in Time Series. EDBT 2015: 97-108. Link
A temporal database is an unordered collection of transactions. A temporal represents a pair constituting of temporal-timestamp and a set of items.
A hypothetical temporal database containing the items a, b, c, d, e, f, and g and its timestamp is shown below
TS | Transactions |
---|---|
1 | a c d f g |
2 | a b c d |
3 | a c d f g |
4 | a b c d e f g |
5 | b c e f g |
7 | c d e f |
8 | a b c d e f g |
9 | c d e f g |
10 | a b c d e f g |
12 | a c d e f g |
13 | a c d f g |
14 | a c e g |
15 | a d f g |
16 | a b c |
Note: Duplicate items must not exist within a transaction.
Each row in a temporal database must contain timestamp and items. A sample transactional database, say sampleInputFile.txt, is provided below.
!cat recurringSample.txt
1 a c d f g
2 a b c d
3 a c d f g
4 a b c d e f g
5 b c e f g
7 c d e f
8 a b c d e f g
9 c d e f g
10 a b c d e f g
12 a c d e f g
13 a c d f g
14 a c e g
15 a d f g
16 a b c
The performance of a pattern mining algorithm primarily depends on the satistical nature of a database. Thus it is important to know the following details of a database:
The below sample code prints the statistical details of a database.
import PAMI.extras.dbStats.TemporalDatabase as stats
obj = stats.TemporalDatabase('sampleInputFile.txt', ' ')
obj.run()
obj.printStats()
The input parameters to a periodic frequent pattern mining algorithm are:
- String : E.g., ‘temporalDatabase.txt’
- URL : E.g., https://u-aizu.ac.jp/~udayrage/datasets/transactionalDatabases/transactional_T10I4D100K.csv
- DataFrame with the header titled ‘TS’ and ‘Transactions’
- count (beween 0 to length of a database) or
- [0, 1]
- count (beween 0 to length of a database) or
- [0, 1]
- [0, 1]
The patterns discovered by a recurring pattern mining algorithm can be saved into a file or a data frame.
syntax: python3 algorithmName.py <path to the input file>
<path to the output file>
<minSup>
<maxPer>
<minRec>
<seperator>
Example: python3 RPGrowth
inputFile.txt
outputFile.txt
4
3
2
' '
import PAMI.recurringPattern.basic.RPGrowth as alg
iFile = 'sampleInputFile.txt' #specify the input temporal database <br>
minSup = 4 #specify the minSup value <br>
maxPer = 4 #specify the maxPer value <br>
minRec = 1.5 #specify the maxRec Value <br>
seperator = ' ' #specify the seperator. Default seperator is tab space. <br>
oFile = 'recurringPatterns.txt' #specify the output file name<br>
obj = alg.RPGrowth(iFile, minSup, maxPer, minRec, seperator) #initialize the algorithm <br>
obj.mine() #start the mining process <br>
obj.save(oFile) #store the patterns in file <br>
df = obj.getPatternsAsDataFrame() #Get the patterns discovered into a dataframe <br>
obj.printResults() #Print the stats of mining process
!cat recurringPatterns.txt # will present the patterns
df #prints the data frame
Patterns | Support | Recurrance | intervals | |
---|---|---|---|---|
0 | a | 11 | 2 | {[1, 4] : 4}{[8, 16] : 7} |
1 | a f | 8 | 2 | {[1, 4] : 3}{[8, 15] : 5} |
2 | a f c | 7 | 2 | {[1, 4] : 3}{[8, 13] : 4} |
3 | a f c d | 7 | 2 | {[1, 4] : 3}{[8, 13] : 4} |
4 | a f c d g | 7 | 2 | {[1, 4] : 3}{[8, 13] : 4} |
5 | a f c g | 7 | 2 | {[1, 4] : 3}{[8, 13] : 4} |
6 | a f d | 8 | 2 | {[1, 4] : 3}{[8, 15] : 5} |
7 | a f d g | 8 | 2 | {[1, 4] : 3}{[8, 15] : 5} |
8 | a f g | 8 | 2 | {[1, 4] : 3}{[8, 15] : 5} |
9 | a d | 9 | 2 | {[1, 4] : 4}{[8, 15] : 5} |
10 | a d g | 8 | 2 | {[1, 4] : 3}{[8, 15] : 5} |
11 | a d g c | 7 | 2 | {[1, 4] : 3}{[8, 13] : 4} |
12 | a d c | 8 | 2 | {[1, 4] : 4}{[8, 13] : 4} |
13 | a g | 9 | 2 | {[1, 4] : 3}{[8, 15] : 6} |
14 | a g c | 8 | 2 | {[1, 4] : 3}{[8, 14] : 5} |
15 | a c | 10 | 2 | {[1, 4] : 4}{[8, 16] : 6} |
16 | d g | 9 | 2 | {[1, 4] : 3}{[8, 15] : 6} |
17 | d g c | 8 | 2 | {[1, 4] : 3}{[8, 13] : 5} |
18 | d g c f | 8 | 2 | {[1, 4] : 3}{[8, 13] : 5} |
19 | d g f | 9 | 2 | {[1, 4] : 3}{[8, 15] : 6} |