Mining periodic correlated patterns in a temporal database

PAMI is a Python library containing 100+ algorithms to discover useful patterns in various databases across multiple computing platforms. (Active)

Mining periodic correlated patterns in a temporal database

1. What is periodic correlated pattern mining?

Periodic correlated pattern mining aims to discover all the interesting patterns in a temporal database that satisfy the user-specified minimum support (minSup), *minimum all confidence (minAllConf), **maximum periodicity (maxPer), and maximum period all-confidence (maxPerAllConf)

Reference: Venkatesh, J.N., Uday Kiran, R., Krishna Reddy, P., Kitsuregawa, M. (2018). Discovering Periodic-Correlated Patterns in Temporal Databases. In: Hameurlain, A., Wagner, R., Hartmann, S., Ma, H. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XXXVIII. Lecture Notes in Computer Science(), vol 11250. Springer, Berlin, Heidelberg. Link

2. What is a temporal database?

A temporal database is an unordered collection of transactions. A temporal represents a pair constituting of temporal-timestamp and a set of items.
A hypothetical temporal database containing the items a, b, c, d, e, f, and g and its timestamp is shown below

TS Transactions
1 a b c g
2 b c d e
3 a b c d
4 a c d f
5 a b c d g
6 c d e f
7 a b c d
8 a e f
9 a b c d
10 b c d e

Note: Duplicate items must not exist within a transaction.

3. What is the acceptable format of a temporal database in PAMI?

Each row in a temporal database must contain timestamp and items. A sample transactional database, say sampleInputFile.txt, is provided below.

Each row in a temporal database must contain timestamp and items. A sample temporal database, say sampleTemporalDatabase.txt, is show below.

1 a b c g
2 b c d e
3 a b c d
4 a c d f
5 a b c d g
6 c d e f
7 a b c d
8 a e f
9 a b c d
10 b c d e

4. What is the need for understand the statisctics of database?

The performance of a pattern mining algorithm primarily depends on the satistical nature of a database. Thus it is important to know the following details of a database:

The below sample code prints the statistical details of a database.

import PAMI.extras.dbStats.TemporalDatabase as stats

obj = stats.TemporalDatabase('sampleTemporalDatabase.txt', ' ')
obj.run()
obj.printStats() 
Database size : 10
Number of items : 7
Minimum Transaction Size : 3
Average Transaction Size : 4.0
Maximum Transaction Size : 5
Minimum period : 1
Average period : 1.0
Maximum period : 1
Standard Deviation Transaction Size : 0.4472135954999579
Variance : 0.2222222222222222
Sparsity : 0.42857142857142855

5. What are the input parameters?

The input parameters to a periodic frequent pattern mining algorithm are:

6. How to store the output of a correlated periodic pattern mining algorithm?

The patterns discovered by a periodic correlated pattern mining algorithm can be saved into a file or a data frame.

7. How to run the correlated periodic pattern mining algorithms in a terminal?

syntax: python3 algorithmName.py <path to the input file> <path to the output file> <minSup> <minAllConf> <maxPer> <maxPerAllConf> <seperator>

Example: python3 EPCPGrowth inputFile.txt outputFile.txt 4 0.5 3 0.4 ' '

8. How to execute a periodic correlated pattern mining algorithm in a Jupyter Notebook?

import PAMI.periodicCorrelatedPattern.basic.EPCPGrowth as alg

iFile = 'sampleInputFile.txt'  # specify the input temporal database 
minSup = 4  # specify the minSup value 
minAllConf = 0.6  # specify the minAllConf value 
maxPer = 4  # specify the maxPer value <br>
maxPerAllConf = 1.5  # specify the maxPerAllConf Value <br>
seperator = ' '  # specify the seperator. Default seperator is tab space. <br>
oFile = 'periodicCorrelatedPatterns.txt'  # specify the output file name<br>

obj = alg.EPCPGrowth(iFile, minSup, minAllConf, maxPer, maxPerAllConf, seperator)  # initialize the algorithm <br>
obj.startMine()  # start the mining process <br>
obj.save(oFile)  # store the patterns in file <br>
df = obj.getPatternsAsDataFrame()  # Get the patterns discovered into a dataframe <br>
obj.printResults()  # Print the stats of mining process
Correlated Periodic-Frequent patterns were generated successfully using EPCPGrowth algorithm 
Total number of Correlated Periodic-Frequent Patterns: 12
Total Memory in USS: 115859456
Total Memory in RSS 156979200
Total ExecutionTime in ms: 0.0005478858947753906

The correlatedPeriodicPatterns.txt file contains the following patterns (format: pattern:support:lability):!cat periodicCorrelatedPatterns.txt

!cat periodicCorrelatedPatterns.txt
#format- pattern:support:periodicity:allConfidence:periodicAllConfidence
e:4:4:1:1 
a:7:2:1:1 
a	b:5:2:0.7142857142857143:1.0 
a	d:5:3:0.625:1.5 
a	c:6:2:0.6666666666666666:1.0 
b:7:2:1:1 
b	d:6:2:0.75:1.0 
b	d	c:6:2:0.6666666666666666:1.0 
b	c:7:2:0.7777777777777778:1.0 
d:8:2:1:1 
d	c:8:2:0.8888888888888888:1.0 
c:9:2:1:1 

The dataframe containing the patterns is shown below:

df
Patterns Support Periodicity allConf maxPerAllConf
0 e 4 4 1.000000 1.0
1 a 7 2 1.000000 1.0
2 a b 5 2 0.714286 1.0
3 a d 5 3 0.625000 1.5
4 a c 6 2 0.666667 1.0
5 b 7 2 1.000000 1.0
6 b d 6 2 0.750000 1.0
7 b d c 6 2 0.666667 1.0
8 b c 7 2 0.777778 1.0
9 d 8 2 1.000000 1.0
10 d c 8 2 0.888889 1.0
11 c 9 2 1.000000 1.0