PAMI is a Python library containing 100+ algorithms to discover useful patterns in various databases across multiple computing platforms. (Active)
Weighted frequent regular pattern mining aims to discover all interesting patterns in a transactional database that have weightedsupport no less than the user-specified weighted minimum support (minWS) constraint and regularity no greater than the user-specified maximum regularity (regularity). The minWS controls the minimum number of transactions that a pattern must appear in a database. The regularity controls the minimum weight of item.
Reference: K. Klangwisan and K. Amphawan, “Mining weighted-frequent-regular itemsets from transactional database,” 2017 9th International Conference on Knowledge and Smart Technology (KST), Chonburi, Thailand, 2017, pp. 66-71, doi: 10.1109/KST.2017.7886090.
A temporal database is a collection of transactions at a particular timestamp, where each transaction contains a timestamp and a set of items.
A hypothetical temporal database containing the items a, b, c, d, e, f, and g as shown below
tid | Transactions |
---|---|
1 | a b c d |
2 | c e f |
3 | a b e f g |
4 | a b c f g |
5 | d e g |
6 | a b c e g |
7 | a b c e |
8 | a b d e |
9 | b c e |
10 | a e g |
Note: Duplicate items must not exist in a transaction.
Each row in a transactional database must contain only items. The frequent pattern mining algorithms in PAMI implicitly assume the row number of a transaction as its transactional-identifier to reduce storage and processing costs. A sample transactional database, say sample.txt, is provided below.
1 a b c d
2 c e f
3 a b e f g
4 a b c f g
5 d e g
6 a b c e g
7 a b c e
8 a b d e
9 b c e
10 a e g
A weight database is a collection of items with their weights.
A hypothetical weight database, say WFRIWeightSample.txt, containing the items a, b, c, d, e, f, and g as shown below
a 0.60
b 0.50
c 0.35
d 0.45
e 0.45
f 0.3
g 0.4
To understand about the database. The below code will give the detail about the transactional database.
The below sample code prints the statistical details of a database.
import PAMI.extras.dbStats.TemporalDatabase as stats
obj = stats.TemporalDatabase('WFRISample.txt', ' ')
obj.run()
obj.printStats()
Database size : 10
Number of items : 7
Minimum Transaction Size : 3
Average Transaction Size : 4.0
Maximum Transaction Size : 5
Minimum period : 1
Average period : 1.0
Maximum period : 1
Standard Deviation Transaction Size : 0.8944271909999159
Variance : 0.8888888888888888
Sparsity : 0.44285714285714284
The input parameters to a weighted frequent regular pattern mining algorithm are:
- String : E.g., ‘temporalDatabase.txt’
- URL : E.g., https://u-aizu.ac.jp/~udayrage/datasets/temporalDatabases/temporal_T10I4D100K.csv
- DataFrame with the header titled with ‘TS’ and ‘Transactions’
- count (beween 0 to length of a database) or
- [0, 1]
- count (beween 0 to length of a database) or
- [0, 1]
The patterns discovered by a weighted frequent regular pattern mining algorithm can be saved into a file or a data frame.
syntax: python3 algorithmName.py <path to the input file>
<path to the output file>
<path to the weight file>
<minWS>
<regularity>
<seperator>
Example: python3 WFRIM.py
inputFile.txt
outputFile.txt
weightSample.txt
3
2
' '
from PAMI.weightedFrequentRegularPattern.basic import WFRIMiner as alg
iFile = 'WFRISample.txt' #specify the input transactional database <br>
wFile = 'WFRIWeightSample.txt' #specify the input transactional database <br>
minWS = 2 #specify the minSupvalue <br>
regularity = 3 #specify the minWeight value <br>
seperator = ' ' #specify the seperator. Default seperator is tab space. <br>
oFile = 'weightedFrequentRegularPatterns.txt' #specify the output file name<br>
obj = alg.WFRIMiner(iFile, wFile, minWS, regularity, seperator) #initialize the algorithm <br>
obj.mine() #start the mining process <br>
obj.save(oFile) #store the patterns in file <br>
df = obj.getPatternsAsDataFrame() #Get the patterns discovered into a dataframe <br>
obj.printResults() #Print the stats of mining process
Weighted Frequent Regular patterns were generated successfully using WFRIM algorithm
Total number of Weighted Frequent Regular Patterns: 9
Total Memory in USS: 99409920
Total Memory in RSS 140251136
Total ExecutionTime in ms: 0.00046825408935546875
The weightedFrequentRegularPatterns.txt file contains the following patterns (format: pattern:support): !cat weightedPatterns.txt
!cat weightedFrequentRegularPatterns.txt
c:[6, 2, 2.0999999999999996]
c b:[5, 3, 2.125]
a:[7, 2, 4.2]
a e:[5, 3, 2.625]
a e b:[5, 3, 2.5833333333333335]
a b:[7, 2, 3.8500000000000005]
e:[8, 2, 3.6]
e b:[6, 3, 2.8499999999999996]
b:[8, 2, 4.0]
The dataframe containing the patterns is shown below:
df
Patterns | Support | |
---|---|---|
0 | c | [6, 2, 2.0999999999999996] |
1 | c b | [5, 3, 2.125] |
2 | a | [7, 2, 4.2] |
3 | a e | [5, 3, 2.625] |
4 | a e b | [5, 3, 2.5833333333333335] |
5 | a b | [7, 2, 3.8500000000000005] |
6 | e | [8, 2, 3.6] |
7 | e b | [6, 3, 2.8499999999999996] |
8 | b | [8, 2, 4.0] |