PAMI is a Python library containing 100+ algorithms to discover useful patterns in various databases across multiple computing platforms. (Active)
Periodic correlated pattern mining aims to discover all the interesting patterns in a temporal database that satisfy the user-specified minimum support (minSup), *minimum all confidence (minAllConf), **maximum periodicity (maxPer), and maximum period all-confidence (maxPerAllConf)
Reference: Venkatesh, J.N., Uday Kiran, R., Krishna Reddy, P., Kitsuregawa, M. (2018). Discovering Periodic-Correlated Patterns in Temporal Databases. In: Hameurlain, A., Wagner, R., Hartmann, S., Ma, H. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XXXVIII. Lecture Notes in Computer Science(), vol 11250. Springer, Berlin, Heidelberg. Link
A temporal database is an unordered collection of transactions. A temporal represents a pair constituting of temporal-timestamp and a set of items.
A hypothetical temporal database containing the items a, b, c, d, e, f, and g and its timestamp is shown below
TS | Transactions |
---|---|
1 | a b c g |
2 | b c d e |
3 | a b c d |
4 | a c d f |
5 | a b c d g |
6 | c d e f |
7 | a b c d |
8 | a e f |
9 | a b c d |
10 | b c d e |
Note: Duplicate items must not exist within a transaction.
Each row in a temporal database must contain timestamp and items. A sample transactional database, say sampleInputFile.txt, is provided below.
Each row in a temporal database must contain timestamp and items. A sample temporal database, say sampleTemporalDatabase.txt, is show below.
1 a b c g
2 b c d e
3 a b c d
4 a c d f
5 a b c d g
6 c d e f
7 a b c d
8 a e f
9 a b c d
10 b c d e
The performance of a pattern mining algorithm primarily depends on the satistical nature of a database. Thus it is important to know the following details of a database:
The below sample code prints the statistical details of a database.
import PAMI.extras.dbStats.TemporalDatabase as stats
obj = stats.TemporalDatabase('sampleTemporalDatabase.txt', ' ')
obj.run()
obj.printStats()
Database size : 10
Number of items : 7
Minimum Transaction Size : 3
Average Transaction Size : 4.0
Maximum Transaction Size : 5
Minimum period : 1
Average period : 1.0
Maximum period : 1
Standard Deviation Transaction Size : 0.4472135954999579
Variance : 0.2222222222222222
Sparsity : 0.42857142857142855
The input parameters to a periodic frequent pattern mining algorithm are:
- String : E.g., ‘transactionalDatabase.txt’
- URL : E.g., https://u-aizu.ac.jp/~udayrage/datasets/transactionalDatabases/transactional_T10I4D100K.csv
- DataFrame with the header titled ‘TS’ and ‘Transactions’
- count (beween 0 to length of a database) or
- [0, 1]
- [0, 1]
- count (beween 0 to length of a database) or
- [0, 1]
- [0, 1]
The patterns discovered by a periodic correlated pattern mining algorithm can be saved into a file or a data frame.
syntax: python3 algorithmName.py <path to the input file>
<path to the output file>
<minSup>
<minAllConf>
<maxPer>
<maxPerAllConf>
<seperator>
Example: python3 EPCPGrowth
inputFile.txt
outputFile.txt
4
0.5
3
0.4
' '
import PAMI.periodicCorrelatedPattern.basic.EPCPGrowth as alg
iFile = 'sampleInputFile.txt' # specify the input temporal database
minSup = 4 # specify the minSup value
minAllConf = 0.6 # specify the minAllConf value
maxPer = 4 # specify the maxPer value <br>
maxPerAllConf = 1.5 # specify the maxPerAllConf Value <br>
seperator = ' ' # specify the seperator. Default seperator is tab space. <br>
oFile = 'periodicCorrelatedPatterns.txt' # specify the output file name<br>
obj = alg.EPCPGrowth(iFile, minSup, minAllConf, maxPer, maxPerAllConf, seperator) # initialize the algorithm <br>
obj.mine() # start the mining process <br>
obj.save(oFile) # store the patterns in file <br>
df = obj.getPatternsAsDataFrame() # Get the patterns discovered into a dataframe <br>
obj.printResults() # Print the stats of mining process
Correlated Periodic-Frequent patterns were generated successfully using EPCPGrowth algorithm
Total number of Correlated Periodic-Frequent Patterns: 12
Total Memory in USS: 115859456
Total Memory in RSS 156979200
Total ExecutionTime in ms: 0.0005478858947753906
The correlatedPeriodicPatterns.txt file contains the following patterns (format: pattern:support:lability):!cat periodicCorrelatedPatterns.txt
!cat periodicCorrelatedPatterns.txt
#format- pattern:support:periodicity:allConfidence:periodicAllConfidence
e:4:4:1:1
a:7:2:1:1
a b:5:2:0.7142857142857143:1.0
a d:5:3:0.625:1.5
a c:6:2:0.6666666666666666:1.0
b:7:2:1:1
b d:6:2:0.75:1.0
b d c:6:2:0.6666666666666666:1.0
b c:7:2:0.7777777777777778:1.0
d:8:2:1:1
d c:8:2:0.8888888888888888:1.0
c:9:2:1:1
The dataframe containing the patterns is shown below:
df
Patterns | Support | Periodicity | allConf | maxPerAllConf | |
---|---|---|---|---|---|
0 | e | 4 | 4 | 1.000000 | 1.0 |
1 | a | 7 | 2 | 1.000000 | 1.0 |
2 | a b | 5 | 2 | 0.714286 | 1.0 |
3 | a d | 5 | 3 | 0.625000 | 1.5 |
4 | a c | 6 | 2 | 0.666667 | 1.0 |
5 | b | 7 | 2 | 1.000000 | 1.0 |
6 | b d | 6 | 2 | 0.750000 | 1.0 |
7 | b d c | 6 | 2 | 0.666667 | 1.0 |
8 | b c | 7 | 2 | 0.777778 | 1.0 |
9 | d | 8 | 2 | 1.000000 | 1.0 |
10 | d c | 8 | 2 | 0.888889 | 1.0 |
11 | c | 9 | 2 | 1.000000 | 1.0 |