PAMI is a Python library containing 100+ algorithms to discover useful patterns in various databases across multiple computing platforms. (Active)
weighted Frequent pattern mining aims to discover all interesting patterns in a transactional database that have support no less than the user-specified minimum support (minSup) constraint and weight no less than the user-specified minimum weight (minWeight). The minSup controls the minimum number of transactions that a pattern must appear in a database. The minWeight controls the minimum weight of item.
A transactional database is a collection of transactions, where each transaction contains a transaction-identifier and a set of items with ites repective uncertain value.
A hypothetical transactional database containing the items _A, B, C, D, E, and F as shown below
tid | Transactions |
---|---|
1 | B(0.5) C(0.45) F(1.0) |
2 | A(0.7) B(0.82) D(0.3) F(0.75) |
3 | C(0.9) D(1.0) E(0.7) |
4 | A(0.48) B(0.8) C(0.6) D(1.0) |
5 | B(0.7) D(0.3) E(1.0) |
6 | B(0.65) C(1.0) D(0.8) |
7 | C(0.9) D(0.5) F(1.0) |
8 | A(0.4) E(0.4) |
9 | A(0.8) B(1.0) D(0.8) F(0.7) |
10 | B(0.4) C(0.9) D(1.0) |
Note: Duplicate items must not exist in a transaction.
Each row in a transactional database must contain only items. The frequent pattern mining algorithms in PAMI implicitly assume the row number of a transaction as its transactional-identifier to reduce storage and processing costs. A sample transactional database, say sample.txt, is provided below.
B(0.5) C(0.45) F(1.0)
A(0.7) B(0.82) D(0.3) F(0.75)
C(0.9) D(1.0) E(0.7)
A(0.48) B(0.8) C(0.6) D(1.0)
B(0.7) D(0.3) E(1.0)
B(0.65) C(1.0) D(0.8)
C(0.9) D(0.5) F(1.0)
A(0.4) E(0.4)
A(0.8) B(1.0) D(0.8) F(0.7)
B(0.4) C(0.9) D(1.0)
A weight database is a collection of items with their weights.
A hypothetical weight database, say HEWIWeightSample.txt, containing the items A, B, C, D, E and F as shown below
A 0.40
B 0.70
C 1.00
D 0.55
E 0.85
F 0.30
To understand about the database. The below code will give the detail about the transactional database.
The below sample code prints the statistical details of a database.
import PAMI.extras.dbStats.TransactionalDatabase as stats
obj = stats.TransactionalDatabase('sample.txt', ' ')
obj.run()
obj.printStats()
The input parameters to a frequent pattern mining algorithm are:
- String : E.g., ‘transactionalDatabase.txt’
- URL : E.g., https://u-aizu.ac.jp/~udayrage/datasets/transactionalDatabases/transactional_T10I4D100K.csv
- DataFrame with the header titled ‘Transactions’
- count (beween 0 to length of a database) or
- [0, 1]
- count (beween 0 to length of a database) or
- [0, 1]
The patterns discovered by a correlated pattern mining algorithm can be saved into a file or a data frame.
syntax: python3 algorithmName.py <path to the input file>
<path to the output file>
<path to the weight file>
<minSup>
<minWeight>
<seperator>
Example: python3 WUFIM.py
inputFile.txt
outputFile.txt
weightSample.txt
3
2
' '
import PAMI.weightedUncertainFrequentPattern.basic.WUFIM as alg
iFile = 'sample.txt' #specify the input transactional database <br>
wFile = 'HEWIWeightSample.txt' #specify the input transactional database <br>
minSup = 1.4 #specify the minSupvalue <br>
minWeight = 1.5 #specify the minWeight value <br>
seperator = ' ' #specify the seperator. Default seperator is tab space. <br>
oFile = 'weightedPatterns.txt' #specify the output file name<br>
obj = alg.WUFIM(iFile, wFile, minSup, minWeight, seperator) #initialize the algorithm <br>
obj.mine() #start the mining process <br>
obj.savePatterns(oFile) #store the patterns in file <br>
df = obj.getPatternsAsDataFrame() #Get the patterns discovered into a dataframe <br>
obj.printStats() #Print the stats of mining process
The weightedPatterns.txt file contains the following patterns (format: pattern:support): !cat weightedPatterns.txt
!cat weightedPatterns.txt
E:2.1
C:4.75
C B:2.525
C B D:2.3
C D:3.65
B:4.870000000000001
B D:2.976
D:5.699999999999999
The dataframe containing the patterns is shown below:
df
Patterns | Support | |
---|---|---|
0 | (E,) | 2.100 |
1 | (C,) | 4.750 |
2 | (C, B) | 2.525 |
3 | (C, B, D) | 2.300 |
4 | (C, D) | 3.650 |
5 | (B,) | 4.870 |
6 | (B, D) | 2.976 |
7 | (D,) | 5.700 |