Mining Partial Periodic-Frequent Patterns in Temporal Databases

PAMI is a Python library containing 100+ algorithms to discover useful patterns in various databases across multiple computing platforms. (Active)

Mining Partial Periodic-Frequent Patterns in Temporal Databases

What is partial periodic-frequent pattern mining?

Partial Periodic-Frequent pattern mining aims to discover all interesting patterns in a temporal database that have support no less than the user-specified minimum support (minSup) constraint, periodicity no greater than user-specified maximum periodicity (maxPer) constraint and periodic ratio no less than user-specified minimum periodic ratio (minPR). The minSup controls the minimum number of transactions that a pattern must appear in a database, maxPer controls the maximum time interval within which a pattern must reappear in the database and the minPR controls the minimum periodic ratio which is the proportion of cyclic repititions of a pattern in database.

Research paper: R. Uday Kiran, J.N. Venkatesh, Masashi Toyoda, Masaru Kitsuregawa, P. Krishna Reddy, Discovering partial periodic-frequent patterns in a transactional database, Journal of Systems and Software, Volume 125, 2017, Pages 170-182, ISSN 0164-1212,Link.

What is a temporal database?

A temporal database is a collection of transactions at a particular timestamp, where each transaction contains a timestamp and a set of items.
A hypothetical temporal database containing the items a, b, c, d, e, f, and g as shown below

TS Transactions
1 a b c g
2 b c d e
3 a b c d
4 a c d f
5 a b c d g
6 c d e f
7 a b c d
8 a e f
9 a b c d
10 b c d e

Note: Duplicate items must not exist in a transaction.

3. What is the acceptable format of a temporal database in PAMI?

Each row in a temporal database must contain timestamp and items. A sample temporal database, say sampleTemporalDatabase.txt, is show below.

1 a b c g
2 b c d e
3 a b c d
4 a c d f
5 a b c d g
6 c d e f
7 a b c d
8 a e f
9 a b c d
10 b c d e

3. What is the acceptable format of a temporal database in PAMI?

Each row in a temporal database must contain timestamp and items. A sample temporal database, say sampleTemporalDatabase.txt, is show below.

1 a b c g
2 b c d e
3 a b c d
4 a c d f
5 a b c d g
6 c d e f
7 a b c d
8 a e f
9 a b c d
10 b c d e

4. What is the need for understanding the statistics of a database?

The performance of a pattern mining algorithm primarily depends on the satistical nature of a database. Thus, it is important to know the following details of a database:

The below sample code prints the statistical details of a database.

3. What is the acceptable format of a temporal database in PAMI?

Each row in a temporal database must contain timestamp and items. A sample temporal database, say sampleTemporalDatabase.txt, is show below.

1 a b c g
2 b c d e
3 a b c d
4 a c d f
5 a b c d g
6 c d e f
7 a b c d
8 a e f
9 a b c d
10 b c d e

4. What is the need for understanding the statistics of a database?

The performance of a pattern mining algorithm primarily depends on the satistical nature of a database. Thus, it is important to know the following details of a database:

The below sample code prints the statistical details of a database.

import PAMI.extras.dbStats.TemporalDatabase as stats

obj = stats.TemporalDatabase('sampleTemporalDatabase.txt', ' ')
obj.run()
obj.printStats() 
Database size : 10
Number of items : 7
Minimum Transaction Size : 3
Average Transaction Size : 4.0
Maximum Transaction Size : 5
Minimum period : 1
Average period : 1.0
Maximum period : 1
Standard Deviation Transaction Size : 0.4472135954999579
Variance : 0.2222222222222222
Sparsity : 0.42857142857142855

5. What is the input to the partial periodic-frequent pattern mining algorithms?

Algorithms to mine the partial periodic-frequent patterns requires temporal database, minSup and maxPer (specified by user).

How to run the partial periodic-frequent pattern algorithm in terminal

syntax: python3 algorithmName.py <path to the input file> <path to the output file> <minSup> <maxPer> <minPR> <seperator>

Example: python3 GPFGrowth.py inputFile.txt outputFile.txt 3 4 0.5 ' '

How to execute a partial periodic-frequent pattern mining algorithm in a Jupyter Notebook?

import PAMI.partialPeriodicFrequentPattern.basic.PPF_DFS as alg 

iFile = 'sampleTemporalDatabase.txt'  #specify the input transactional database
minSup = 5                     #specify the minSup value
maxPer = 3                     #specify the maxPer value
minPR = 0.4                   #specify the minSup value
seperator = ' '                #specify the seperator. Default seperator is tab space.
oFile = 'partialPeriodicFrequentPatterns.txt'   #specify the output file name

obj = alg.PPF_DFS(iFile, minSup, maxPer, minPR, seperator) #initialize the algorithm 
obj.startMine()                       #start the mining process 
obj.savePatterns(oFile)               #store the patterns in file 
df = obj.getPatternsAsDataFrame()     #Get the patterns discovered into a dataframe 
obj.printResults()                      #Print the stats of mining process
---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

Input In [5], in <cell line: 10>()
      7 seperator = ' '                #specify the seperator. Default seperator is tab space.
      8 oFile = 'partialPeriodicFrequentPatterns.txt'   #specify the output file name
---> 10 obj = alg.PPF_DFS(iFile, minSup, maxPer, minPR, seperator) #initialize the algorithm 
     11 obj.startMine()                       #start the mining process 
     12 obj.savePatterns(oFile)               #store the patterns in file 


TypeError: Can't instantiate abstract class PPF_DFS with abstract method printResults

The periodicPatterns.txt file contains the following patterns (format: pattern:support:periodicity):!cat periodicPatterns.txt

!cat periodicPatterns.txt
('d', 'c', 'b'):[6, 1.0] 
('d', 'c', 'a'):[5, 1.0] 
('c', 'd'):[8, 1.0] 
('b', 'c', 'a'):[5, 1.0] 
('c', 'b'):[7, 1.0] 
('c', 'a'):[6, 1.0] 
('c',):[9, 1.0] 
('d', 'b'):[6, 1.0] 
('d', 'a'):[5, 1.0] 
('d',):[8, 1.0] 
('b', 'a'):[5, 1.0] 
('b',):[7, 1.0] 
('a',):[7, 1.0] 

The dataframe containing the patterns is shown below:

df
Patterns Support Periodicity
0 (d, c, b) 6 1.0
1 (d, c, a) 5 1.0
2 (c, d) 8 1.0
3 (b, c, a) 5 1.0
4 (c, b) 7 1.0
5 (c, a) 6 1.0
6 (c,) 9 1.0
7 (d, b) 6 1.0
8 (d, a) 5 1.0
9 (d,) 8 1.0
10 (b, a) 5 1.0
11 (b,) 7 1.0
12 (a,) 7 1.0