PAMI is a Python library containing 100+ algorithms to discover useful patterns in various databases across multiple computing platforms. (Active)
Partial Periodic-Frequent pattern mining aims to discover all interesting patterns in a temporal database that have support no less than the user-specified minimum support (minSup) constraint, periodicity no greater than user-specified maximum periodicity (maxPer) constraint and periodic ratio no less than user-specified minimum periodic ratio (minPR). The minSup controls the minimum number of transactions that a pattern must appear in a database, maxPer controls the maximum time interval within which a pattern must reappear in the database and the minPR controls the minimum periodic ratio which is the proportion of cyclic repititions of a pattern in database.
Research paper: R. Uday Kiran, J.N. Venkatesh, Masashi Toyoda, Masaru Kitsuregawa, P. Krishna Reddy, Discovering partial periodic-frequent patterns in a transactional database, Journal of Systems and Software, Volume 125, 2017, Pages 170-182, ISSN 0164-1212,Link.
A temporal database is a collection of transactions at a particular timestamp, where each transaction contains a timestamp and a set of items.
A hypothetical temporal database containing the items a, b, c, d, e, f, and g as shown below
TS | Transactions |
---|---|
1 | a b c g |
2 | b c d e |
3 | a b c d |
4 | a c d f |
5 | a b c d g |
6 | c d e f |
7 | a b c d |
8 | a e f |
9 | a b c d |
10 | b c d e |
Note: Duplicate items must not exist in a transaction.
Each row in a temporal database must contain timestamp and items. A sample temporal database, say sampleTemporalDatabase.txt, is show below.
1 a b c g
2 b c d e
3 a b c d
4 a c d f
5 a b c d g
6 c d e f
7 a b c d
8 a e f
9 a b c d
10 b c d e
Each row in a temporal database must contain timestamp and items. A sample temporal database, say sampleTemporalDatabase.txt, is show below.
1 a b c g
2 b c d e
3 a b c d
4 a c d f
5 a b c d g
6 c d e f
7 a b c d
8 a e f
9 a b c d
10 b c d e
The performance of a pattern mining algorithm primarily depends on the satistical nature of a database. Thus, it is important to know the following details of a database:
The below sample code prints the statistical details of a database.
Each row in a temporal database must contain timestamp and items. A sample temporal database, say sampleTemporalDatabase.txt, is show below.
1 a b c g
2 b c d e
3 a b c d
4 a c d f
5 a b c d g
6 c d e f
7 a b c d
8 a e f
9 a b c d
10 b c d e
The performance of a pattern mining algorithm primarily depends on the satistical nature of a database. Thus, it is important to know the following details of a database:
The below sample code prints the statistical details of a database.
import PAMI.extras.dbStats.TemporalDatabase as stats
obj = stats.TemporalDatabase('sampleTemporalDatabase.txt', ' ')
obj.run()
obj.printStats()
Database size : 10
Number of items : 7
Minimum Transaction Size : 3
Average Transaction Size : 4.0
Maximum Transaction Size : 5
Minimum period : 1
Average period : 1.0
Maximum period : 1
Standard Deviation Transaction Size : 0.4472135954999579
Variance : 0.2222222222222222
Sparsity : 0.42857142857142855
Algorithms to mine the partial periodic-frequent patterns requires temporal database, minSup and maxPer (specified by user).
- String : E.g., ‘temporalDatabase.txt’
- URL : E.g., https://u-aizu.ac.jp/~udayrage/datasets/temporalDatabases/temporal_T10I4D100K.csv
- DataFrame. Please note that dataframe must contain the header titled ‘TS’ and ‘Transactions’
- count (beween 0 to length of database)
- [0, 1]
- count (beween 0 to length of database)
- [0, 1]
- [0, 1]
syntax: python3 algorithmName.py <path to the input file>
<path to the output file>
<minSup>
<maxPer>
<minPR>
<seperator>
Example: python3 GPFGrowth.py
inputFile.txt
outputFile.txt
3
4
0.5
' '
import PAMI.partialPeriodicFrequentPattern.basic.PPF_DFS as alg
iFile = 'sampleTemporalDatabase.txt' #specify the input transactional database
minSup = 5 #specify the minSup value
maxPer = 3 #specify the maxPer value
minPR = 0.4 #specify the minSup value
seperator = ' ' #specify the seperator. Default seperator is tab space.
oFile = 'partialPeriodicFrequentPatterns.txt' #specify the output file name
obj = alg.PPF_DFS(iFile, minSup, maxPer, minPR, seperator) #initialize the algorithm
obj.mine() #start the mining process
obj.savePatterns(oFile) #store the patterns in file
df = obj.getPatternsAsDataFrame() #Get the patterns discovered into a dataframe
obj.printResults() #Print the stats of mining process
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [5], in <cell line: 10>()
7 seperator = ' ' #specify the seperator. Default seperator is tab space.
8 oFile = 'partialPeriodicFrequentPatterns.txt' #specify the output file name
---> 10 obj = alg.PPF_DFS(iFile, minSup, maxPer, minPR, seperator) #initialize the algorithm
11 obj.mine() #start the mining process
12 obj.savePatterns(oFile) #store the patterns in file
TypeError: Can't instantiate abstract class PPF_DFS with abstract method printResults
The periodicPatterns.txt file contains the following patterns (format: pattern:support:periodicity):!cat periodicPatterns.txt
!cat periodicPatterns.txt
('d', 'c', 'b'):[6, 1.0]
('d', 'c', 'a'):[5, 1.0]
('c', 'd'):[8, 1.0]
('b', 'c', 'a'):[5, 1.0]
('c', 'b'):[7, 1.0]
('c', 'a'):[6, 1.0]
('c',):[9, 1.0]
('d', 'b'):[6, 1.0]
('d', 'a'):[5, 1.0]
('d',):[8, 1.0]
('b', 'a'):[5, 1.0]
('b',):[7, 1.0]
('a',):[7, 1.0]
The dataframe containing the patterns is shown below:
df
Patterns | Support | Periodicity | |
---|---|---|---|
0 | (d, c, b) | 6 | 1.0 |
1 | (d, c, a) | 5 | 1.0 |
2 | (c, d) | 8 | 1.0 |
3 | (b, c, a) | 5 | 1.0 |
4 | (c, b) | 7 | 1.0 |
5 | (c, a) | 6 | 1.0 |
6 | (c,) | 9 | 1.0 |
7 | (d, b) | 6 | 1.0 |
8 | (d, a) | 5 | 1.0 |
9 | (d,) | 8 | 1.0 |
10 | (b, a) | 5 | 1.0 |
11 | (b,) | 7 | 1.0 |
12 | (a,) | 7 | 1.0 |