PAMI is a Python library containing 100+ algorithms to discover useful patterns in various databases across multiple computing platforms. (Active)
Periodic-Frequent pattern mining aims to discover all interesting patterns in a temporal database that have support no less than the user-specified minimum support (minSup) constraint and periodicity no greater than the user-specified maximum periodicity (maxPer) constraint. The minSup controls the minimum number of transactions that a pattern must appear in a database and the maxPer controls the maximum time interval within which a pattern must reappear in the database.
Research paper: Tanbeer, Syed & Ahmed, Chowdhury & Jeong, Byeong-Soo. (2009). Discovering Periodic-Frequent Patterns in Transactional Databases. 5476. 242-253. 10.1007/978-3-642-01307-2_24 link.
A temporal database is a collection of transactions at a particular timestamp, where each transaction contains a timestamp and a set of items.
A hypothetical temporal database containing the items a, b, c, d, e, f, and g as shown below
TS | Transactions |
---|---|
1 | a b c g |
2 | b c d e |
3 | a b c d |
4 | a c d f |
5 | a b c d g |
6 | c d e f |
7 | a b c d |
8 | a e f |
9 | a b c d |
10 | b c d e |
Note: Duplicate items must not exist in a transaction.
Each row in a temporal database must contain timestamp and items. A sample temporal database, say sampleTemporalDatabase.txt, is show below.
1 a b c g
2 b c d e
3 a b c d
4 a c d f
5 a b c d g
6 c d e f
7 a b c d
8 a e f
9 a b c d
10 b c d e
The performance of a pattern mining algorithm primarily depends on the satistical nature of a database. Thus, it is important to know the following details of a database:
The below sample code prints the statistical details of a database.
import PAMI.extras.dbStats.TemporalDatabase as stats
obj = stats.TemporalDatabase('sampleTemporalDatabase.txt', ' ')
obj.run()
obj.printStats()
Algorithms to mine the periodic-frequent patterns requires temporal database, minSup and maxPer (specified by user).
- String : E.g., ‘temporalDatabase.txt’
- URL : E.g., https://u-aizu.ac.jp/~udayrage/datasets/temporalDatabases/temporal_T10I4D100K.csv
- DataFrame. Please note that dataframe must contain the header titled ‘TS’ and ‘Transactions’
- count (beween 0 to length of database)
- [0, 1]
- count (beween 0 to length of database)
- [0, 1]
foo@bar: cd PAMI/periodicFrequentPattern/basic
foo@bar:python3 algorithmName.py inputFile outputFile minSup seperator
Example: python3 PFPGrowth.py
inputFile.txt
outputFile.txt
3
4
' '
import PAMI.periodicFrequentPattern.basic.PFPGrowth as alg
iFile = 'sampleTemporalDatabase.txt' #specify the input transactional database
minSup = 5 #specify the minSup value
maxPer = 3 #specify the maxPer value
seperator = ' ' #specify the seperator. Default seperator is tab space.
oFile = 'periodicFrequentPatterns.txt' #specify the output file name
obj = alg.PFPGrowth(iFile, minSup, maxPer, seperator) #initialize the algorithm
obj.mine() #start the mining process
obj.save(oFile) #store the patterns in file
df = obj.getPatternsAsDataFrame() #Get the patterns discovered into a dataframe
obj.printResults() #Print the stats of mining process
Periodic Frequent patterns were generated successfully using PFPGrowth algorithm
Total number of Periodic Frequent Patterns: 13
Total Memory in USS: 90112000
Total Memory in RSS 127647744
Total ExecutionTime in ms: 0.00040459632873535156
The periodicPatterns.txt file contains the following patterns (format: pattern:support:periodicity):!cat periodicPatterns.txt
!cat periodicFrequentPatterns.txt
#Format is pattern:support:periodicity
a:7:2
a b:5:2
a b c:5:2
a d:5:3
a d c:5:3
a c:6:2
b:7:2
b d:6:2
b d c:6:2
b c:7:2
d:8:2
d c:8:2
c:9:2
The dataframe containing the patterns is shown below:
df #The dataframe containing the patterns is shown below. In each pattern, items were seperated from each other with a tab space (or \t).
Patterns | Support | Periodicity | |
---|---|---|---|
0 | a | 7 | 2 |
1 | a b | 5 | 2 |
2 | a b c | 5 | 2 |
3 | a d | 5 | 3 |
4 | a d c | 5 | 3 |
5 | a c | 6 | 2 |
6 | b | 7 | 2 |
7 | b d | 6 | 2 |
8 | b d c | 6 | 2 |
9 | b c | 7 | 2 |
10 | d | 8 | 2 |
11 | d c | 8 | 2 |
12 | c | 9 | 2 |