Mining Recurring Patterns in Temporal Databases

PAMI is a Python library containing 100+ algorithms to discover useful patterns in various databases across multiple computing platforms. (Active)

Mining Recurring Patterns in Temporal Databases

What is recurrent pattern mining?

Recurrent pattern mining aims to discover all interesting patterns in a temporal database that have periodic support no less than the user-specified minimum periodic support (minPS) constraint, period no greater than the user-specified maximum Interval Time (maxIAT) constraint and Recurrence no less than the user-specified minimum recurrence (minRec). The minSup controls the minimum number of transactions that a pattern must appear in a database. The maxIAT controls the maximum interval time the pattern must reappear. The minRec controls the number of periodic intervals of a pattern.

Reference: R. Uday Kiran, Haichuan Shang, Masashi Toyoda, Masaru Kitsuregawa: Discovering Recurring Patterns in Time Series. EDBT 2015: 97-108. Link

What is a temporal database?

A temporal database is an unordered collection of transactions. A temporal represents a pair constituting of temporal-timestamp and a set of items.
A hypothetical temporal database containing the items a, b, c, d, e, f, and g and its timestamp is shown below

TS Transactions
1 a c d f g
2 a b c d
3 a c d f g
4 a b c d e f g
5 b c e f g
7 c d e f
8 a b c d e f g
9 c d e f g
10 a b c d e f g
12 a c d e f g
13 a c d f g
14 a c e g
15 a d f g
16 a b c

Note: Duplicate items must not exist within a transaction.

What is the acceptable format of a temporal database in PAMI?

Each row in a temporal database must contain timestamp and items. A sample transactional database, say sampleInputFile.txt, is provided below.

!cat recurringSample.txt
1 a c d f g
2 a b c d
3 a c d f g
4 a b c d e f g
5 b c e f g
7 c d e f
8 a b c d e f g
9 c d e f g
10 a b c d e f g
12 a c d e f g
13 a c d f g
14 a c e g
15 a d f g
16 a b c

Understanding the statistics of a temporal database

The performance of a pattern mining algorithm primarily depends on the satistical nature of a database. Thus it is important to know the following details of a database:

The below sample code prints the statistical details of a database.

import PAMI.extras.dbStats.TemporalDatabase as stats

obj = stats.TemporalDatabase('sampleInputFile.txt', ' ')
obj.run()
obj.printStats() 

What are the input parameters?

The input parameters to a periodic frequent pattern mining algorithm are:

How to store the output of a recurring pattern mining algorithm?

The patterns discovered by a recurring pattern mining algorithm can be saved into a file or a data frame.

How to run the recurrent pattern mining algorithms in a terminal?

syntax: python3 algorithmName.py <path to the input file> <path to the output file> <minSup> <maxPer> <minRec> <seperator>

Example: python3 RPGrowth inputFile.txt outputFile.txt 4 3 2 ' '

How to execute a recurring pattern mining algorithm in a Jupyter Notebook?

import PAMI.recurringPattern.basic.RPGrowth as alg 

iFile = 'sampleInputFile.txt'  #specify the input temporal database <br>
minSup = 4  #specify the minSup value <br>
maxPer = 4   #specify the maxPer value <br>
minRec = 1.5    #specify the maxRec Value <br>
seperator = ' ' #specify the seperator. Default seperator is tab space. <br>
oFile = 'recurringPatterns.txt'   #specify the output file name<br>

obj = alg.RPGrowth(iFile, minSup, maxPer, minRec, seperator) #initialize the algorithm <br>
obj.startMine()                       #start the mining process <br>
obj.save(oFile)               #store the patterns in file <br>
df = obj.getPatternsAsDataFrame()     #Get the patterns discovered into a dataframe <br>
obj.printResults()                      #Print the stats of mining process
!cat recurringPatterns.txt  # will present the patterns
df #prints the data frame
Patterns Support Recurrance intervals
0 a 11 2 {[1, 4] : 4}{[8, 16] : 7}
1 a f 8 2 {[1, 4] : 3}{[8, 15] : 5}
2 a f c 7 2 {[1, 4] : 3}{[8, 13] : 4}
3 a f c d 7 2 {[1, 4] : 3}{[8, 13] : 4}
4 a f c d g 7 2 {[1, 4] : 3}{[8, 13] : 4}
5 a f c g 7 2 {[1, 4] : 3}{[8, 13] : 4}
6 a f d 8 2 {[1, 4] : 3}{[8, 15] : 5}
7 a f d g 8 2 {[1, 4] : 3}{[8, 15] : 5}
8 a f g 8 2 {[1, 4] : 3}{[8, 15] : 5}
9 a d 9 2 {[1, 4] : 4}{[8, 15] : 5}
10 a d g 8 2 {[1, 4] : 3}{[8, 15] : 5}
11 a d g c 7 2 {[1, 4] : 3}{[8, 13] : 4}
12 a d c 8 2 {[1, 4] : 4}{[8, 13] : 4}
13 a g 9 2 {[1, 4] : 3}{[8, 15] : 6}
14 a g c 8 2 {[1, 4] : 3}{[8, 14] : 5}
15 a c 10 2 {[1, 4] : 4}{[8, 16] : 6}
16 d g 9 2 {[1, 4] : 3}{[8, 15] : 6}
17 d g c 8 2 {[1, 4] : 3}{[8, 13] : 5}
18 d g c f 8 2 {[1, 4] : 3}{[8, 13] : 5}
19 d g f 9 2 {[1, 4] : 3}{[8, 15] : 6}