PAMI is a Python library containing 100+ algorithms to discover useful patterns in various databases across multiple computing platforms. (Active)
Periodic-Frequent Pattern Mining (PFPM) is an important knowledge discovery technique in data mining with many real-world applications. It involves identifying all patterns that have exhibited perfect periodic behavior in a temporal database. A major limitation of this technique is that it fails to discover those interesting patterns that have exhibited partial periodic behavior in a temporal database. Partial periodic pattern mining (PPPM) has been introduced to tackle this problem.
Partial periodic pattern mining aims to discover all interesting patterns in a temporal database that satisfy the user-specified maximum inter-arrival time (maxIAT) and minimum periodic support (minPS) constraints. The maxIAT controls the maximum inter-arrival time within which a pattern must reappear in order to consider its reoccurrence periodic in a database. The minPS coontrols the minimum number of periodic occurrences a pattern must have in a temporal database.
Reference: R. Uday Kiran, Haichuan Shang, Masashi Toyoda, and Masaru Kitsuregawa. 2017. Discovering Partial Periodic Itemsets in Temporal Databases. In Proceedings of the 29th International Conference on Scientific and Statistical Database Management (SSDBM ‘17). Association for Computing Machinery, New York, NY, USA, Article 30, 1–6. Link
A temporal database is a collection of transactions at a particular timestamp, where each transaction contains a timestamp and a set of items.
A hypothetical temporal database containing the items a, b, c, d, e, f, and g as shown below
TS | Transactions |
---|---|
1 | a b c g |
2 | b c d e |
3 | a b c d |
4 | a c d f |
5 | a b c d g |
6 | c d e f |
7 | a b c d |
8 | a e f |
9 | a b c d |
10 | b c d e |
Note: Duplicate items must not exist in a transaction.
Each row in a temporal database must contain timestamp and items. A sample temporal database, say sampleTemporalDatabase.txt, is show below.
1 a b c g
2 b c d e
3 a b c d
4 a c d f
5 a b c d g
6 c d e f
7 a b c d
8 a e f
9 a b c d
10 b c d e
The performance of a pattern mining algorithm primarily depends on the satistical nature of a database. Thus, it is important to know the following details of a database:
The below sample code prints the statistical details of a database.
import PAMI.extras.dbStats.TemporalDatabase as stats
obj = stats.TemporalDatabase('sampleTemporalDatabase.txt', ' ')
obj.run()
obj.printStats()
The input parameters to a partial periodic pattern mining algorithm are:
- String : E.g., ‘temporalDatabase.txt’
- URL : E.g., https://u-aizu.ac.jp/~udayrage/datasets/temporalDatabases/temporal_T10I4D100K.csv
- DataFrame with the header titled ‘TS’ and ‘Transactions’
- count (beween 0 to length of a database) or
- [0, 1]
- count (beween 0 to length of a database) or
- [0, 1]
The patterns discovered by a frequent pattern mining algorithm can be saved into a file or a data frame.
foo@bar: cd PAMI/periodicPeriodicPattern/basic
foo@bar:python3 algorithmName.py inputFile outputFile minSup seperator
Example: python3 PPPGrowth.py
inputFile.txt
outputFile.txt
4
3
' '
import PAMI.partialPeriodicPattern.basic.PPPGrowth as alg
iFile = 'sampleTemporalDatabase.txt' #specify the input temporal database
minPS = 5 #specify the minPS value
maxIAT = 2 #specify the maxIAT value
seperator = ' ' #specify the seperator. Default seperator is tab space.
oFile = 'partialPatterns.txt' #specify the output file name
obj = alg.PPPGrowth(iFile, minPS, maxIAT, seperator) #initialize the algorithm
obj.mine() #start the mining process
obj.save(oFile) #store the patterns in file
df = obj.getPatternsAsDataFrame() #Get the patterns discovered into a dataframe
obj.printResults() #Print the stats of mining process
Partial Periodic Patterns were generated successfully using 3PGrowth algorithm
Total number of Partial Periodic Patterns: 9
Total Memory in USS: 102936576
Total Memory in RSS 142065664
Total ExecutionTime in ms: 0.001527547836303711
The partialPatterns.txt file contains the following patterns (format: pattern:periodicSupport):!cat partialPatterns.txt
!cat partialPatterns.txt
#Format is pattern:periodic-support
b:6
b d:5
b d c:5
b c:6
a:6
a c:5
d:7
d c:7
c:8
The dataframe containing the patterns is shown below:
df #The dataframe containing the patterns is shown below. In each pattern, items were seperated from each other with a tab space (or \t).
Patterns | periodicSupport | |
---|---|---|
0 | b | 6 |
1 | b d | 5 |
2 | b d c | 5 |
3 | b c | 6 |
4 | a | 6 |
5 | a c | 5 |
6 | d | 7 |
7 | d c | 7 |
8 | c | 8 |