PAMI is a Python library containing 100+ algorithms to discover useful patterns in various databases across multiple computing platforms. (Active)
Geo Referenced Periodic-Frequent pattern mining aims to discover all interesting patterns in a temporal database that have support no less than the user-specified minimum support (minSup) constraint, periodicity no greater than user-specified maximum periodicity (maxPer) constraint and distance between two items is no less than maximum distance (maxDist). The minSup controls the minimum number of transactions that a pattern must appear in a database and the maxPer controls the maximum time interval within which a pattern must reappear in the database.
A temporal database is a collection of transactions at a particular timestamp, where each transaction contains a timestamp and a set of items.
A hypothetical temporal database containing the items a, b, c, d, e, f, and g as shown below
TS | Transactions |
---|---|
1 | a b c g |
2 | b c d e |
3 | a b c d |
4 | a c d f |
5 | a b c d g |
6 | c d e f |
7 | a b c d |
8 | a e f |
9 | a b c d |
10 | b c d e |
Note: Duplicate items must not exist in a transaction.
Each row in a temporal database must contain timestamp and items.
1 a b c g
2 b c d e
3 a b c d
4 a c d f
5 a b c d g
6 c d e f
7 a b c d
8 a e f
9 a b c d
10 b c d e
Spatial database contain the spatial (neighbourhood) information of items. It contains the items and its nearset neighbours satisfying the maxDist constraint.
Items | neighbours |
---|---|
a | b, c, d |
b | a, e, g |
c | a, d |
d | a, c |
e | b, f |
f | e, g |
g | b, f |
To understand about the database. The below code will give the detail about the transactional database.
The below sample code prints the statistical details of a database.
import PAMI.extras.dbStats.TemporalDatabase as stats
obj = stats.TemporalDatabase('sampleInputFile.txt', ' ')
obj.run()
obj.printStats()
Database size : 10
Number of items : 7
Minimum Transaction Size : 3
Average Transaction Size : 4.0
Maximum Transaction Size : 5
Minimum period : 1
Average period : 1.0
Maximum period : 1
Standard Deviation Transaction Size : 0.4472135954999579
Variance : 0.2222222222222222
Sparsity : 0.42857142857142855
The input parameters to a periodic frequent spatial pattern mining algorithm are:
- String : E.g., ‘temporalDatabase.txt’
- URL : E.g., https://u-aizu.ac.jp/~udayrage/datasets/transactionalDatabases/transactional_T10I4D100K.csv
- DataFrame with the header titled ‘TS’ and ‘Transactions’
- String : E.g., ‘NeighbourDatabase.txt’
- URL : E.g., https://u-aizu.ac.jp/~udayrage/datasets/transactionalDatabases/transactional_T10I4D100K.csv
- DataFrame with the header titled ‘TS’ and ‘Transactions’
- count (beween 0 to length of a database) or
- [0, 1]
- count (beween 0 to length of a database) or
- [0, 1]
The patterns discovered by a geo referenced periodic frequent pattern mining algorithm can be saved into a file or a data frame.
syntax: python3 algorithmName.py <path to the input file>
<path to the output file>
<path to the neighbour file>
<minSup>
<maxPer>
<seperator>
Example: python3 GPFPMiner.py
inputFile.txt
outputFile.txt
neighbourFile.txt
3
4
' '
import PAMI.geoReferencedPeriodicFrequentPattern.basic.GPFPMiner as alg
iFile = 'sampleInputFile.txt' #specify the input temporal database <br>
nFile = 'sampleNeighbourFile.txt' #specify the input neighbour database <br>
minSup = 5 #specify the minSupvalue <br>
maxPer = 3 #specify the maxPer value <br>
seperator = ' ' #specify the seperator. Default seperator is tab space. <br>
oFile = 'Patterns.txt' #specify the output file name<br>
obj = alg.GPFPMiner(iFile, nFile, minSup, maxPer, seperator) #initialize the algorithm <br>
obj.mine() #start the mining process <br>
obj.save(oFile) #store the patterns in file <br>
df = obj.getPatternsAsDataFrame() #Get the patterns discovered into a dataframe <br>
obj.printResults() #Print the stats of mining process
Spatial Periodic Frequent patterns were generated successfully using SpatialEclat algorithm
Total number of Spatial Periodic-Frequent Patterns: 9
Total Memory in USS: 115994624
Total Memory in RSS 156839936
Total ExecutionTime in seconds: 0.0010027885437011719
The Patterns.txt file contains the following patterns (format: pattern:support:periodicity):!cat Patterns.txt
!cat Patterns.txt
d c a : 5: 3
c d : 8: 2
c a : 6: 2
c : 9: 2
d a : 5: 3
d : 8: 2
b a : 5: 2
b : 7: 2
a : 7: 2
The dataframe containing the patterns is shown below:
df
Patterns | Support | Period | |
---|---|---|---|
0 | d\tc\ta\t | 5 | 3 |
1 | c\td\t | 8 | 2 |
2 | c\ta\t | 6 | 2 |
3 | c\t | 9 | 2 |
4 | d\ta\t | 5 | 3 |
5 | d\t | 8 | 2 |
6 | b\ta\t | 5 | 2 |
7 | b\t | 7 | 2 |
8 | a\t | 7 | 2 |