Mining Periodic-Frequent Patterns in Temporal Databases

PAMI is a Python library containing 100+ algorithms to discover useful patterns in various databases across multiple computing platforms. (Active)

Mining Periodic-Frequent Patterns in Temporal Databases

1. What is periodic-frequent pattern mining?

Periodic-Frequent pattern mining aims to discover all interesting patterns in a temporal database that have support no less than the user-specified minimum support (minSup) constraint and periodicity no greater than the user-specified maximum periodicity (maxPer) constraint. The minSup controls the minimum number of transactions that a pattern must appear in a database and the maxPer controls the maximum time interval within which a pattern must reappear in the database.

Research paper: Tanbeer, Syed & Ahmed, Chowdhury & Jeong, Byeong-Soo. (2009). Discovering Periodic-Frequent Patterns in Transactional Databases. 5476. 242-253. 10.1007/978-3-642-01307-2_24 link.

2. What is a temporal database?

A temporal database is a collection of transactions at a particular timestamp, where each transaction contains a timestamp and a set of items.
A hypothetical temporal database containing the items a, b, c, d, e, f, and g as shown below

TS Transactions
1 a b c g
2 b c d e
3 a b c d
4 a c d f
5 a b c d g
6 c d e f
7 a b c d
8 a e f
9 a b c d
10 b c d e

Note: Duplicate items must not exist in a transaction.

3. What is the acceptable format of a temporal database in PAMI?

Each row in a temporal database must contain timestamp and items. A sample temporal database, say sampleTemporalDatabase.txt, is show below.

1 a b c g
2 b c d e
3 a b c d
4 a c d f
5 a b c d g
6 c d e f
7 a b c d
8 a e f
9 a b c d
10 b c d e

4. What is the need for understanding the statistics of a database?

The performance of a pattern mining algorithm primarily depends on the satistical nature of a database. Thus, it is important to know the following details of a database:

The below sample code prints the statistical details of a database.

import PAMI.extras.dbStats.TemporalDatabase as stats

obj = stats.TemporalDatabase('sampleTemporalDatabase.txt', ' ')
obj.run()
obj.printStats() 

5. What is the input to periodic-frequent pattern mining algorithms?

Algorithms to mine the periodic-frequent patterns requires temporal database, minSup and maxPer (specified by user).

6. How to run a periodic-frequent pattern algorithm on a terminal?

foo@bar: cd PAMI/periodicFrequentPattern/basic
foo@bar:python3 algorithmName.py inputFile outputFile minSup seperator

Example: python3 PFPGrowth.py inputFile.txt outputFile.txt 3 4 ' '

How to execute a periodic-frequent pattern mining algorithm in a Jupyter Notebook?

import PAMI.periodicFrequentPattern.basic.PFPGrowth as alg 

iFile = 'sampleTemporalDatabase.txt'  #specify the input transactional database
minSup = 5                     #specify the minSup value
maxPer = 3                     #specify the maxPer value
seperator = ' '                #specify the seperator. Default seperator is tab space.
oFile = 'periodicFrequentPatterns.txt'   #specify the output file name

obj = alg.PFPGrowth(iFile, minSup, maxPer, seperator) #initialize the algorithm 
obj.startMine()                       #start the mining process 
obj.save(oFile)               #store the patterns in file 
df = obj.getPatternsAsDataFrame()     #Get the patterns discovered into a dataframe 
obj.printResults()                      #Print the stats of mining process
Periodic Frequent patterns were generated successfully using PFPGrowth algorithm 
Total number of Periodic Frequent Patterns: 13
Total Memory in USS: 90112000
Total Memory in RSS 127647744
Total ExecutionTime in ms: 0.00040459632873535156

The periodicPatterns.txt file contains the following patterns (format: pattern:support:periodicity):!cat periodicPatterns.txt

!cat periodicFrequentPatterns.txt
#Format is pattern:support:periodicity
a:7:2 
a	b:5:2 
a	b	c:5:2 
a	d:5:3 
a	d	c:5:3 
a	c:6:2 
b:7:2 
b	d:6:2 
b	d	c:6:2 
b	c:7:2 
d:8:2 
d	c:8:2 
c:9:2 

The dataframe containing the patterns is shown below:

df #The dataframe containing the patterns is shown below. In each pattern, items were seperated from each other with a tab space (or \t). 
Patterns Support Periodicity
0 a 7 2
1 a b 5 2
2 a b c 5 2
3 a d 5 3
4 a d c 5 3
5 a c 6 2
6 b 7 2
7 b d 6 2
8 b d c 6 2
9 b c 7 2
10 d 8 2
11 d c 8 2
12 c 9 2