Mining Weighted Frequent Regular Patterns in Transactional Databases

PAMI is a Python library containing 100+ algorithms to discover useful patterns in various databases across multiple computing platforms. (Active)

Mining Weighted Frequent Regular Patterns in Transactional Databases

1. What is weighted frequent regular pattern mining?

Weighted frequent regular pattern mining aims to discover all interesting patterns in a transactional database that have weightedsupport no less than the user-specified weighted minimum support (minWS) constraint and regularity no greater than the user-specified maximum regularity (regularity). The minWS controls the minimum number of transactions that a pattern must appear in a database. The regularity controls the minimum weight of item.

Reference: K. Klangwisan and K. Amphawan, “Mining weighted-frequent-regular itemsets from transactional database,” 2017 9th International Conference on Knowledge and Smart Technology (KST), Chonburi, Thailand, 2017, pp. 66-71, doi: 10.1109/KST.2017.7886090.

2. What is the temporal database?

A temporal database is a collection of transactions at a particular timestamp, where each transaction contains a timestamp and a set of items.
A hypothetical temporal database containing the items a, b, c, d, e, f, and g as shown below

tid Transactions
1 a b c d
2 c e f
3 a b e f g
4 a b c f g
5 d e g
6 a b c e g
7 a b c e
8 a b d e
9 b c e
10 a e g

Note: Duplicate items must not exist in a transaction.

3. What is acceptable format of a transactional databases in PAMI

Each row in a transactional database must contain only items. The frequent pattern mining algorithms in PAMI implicitly assume the row number of a transaction as its transactional-identifier to reduce storage and processing costs. A sample transactional database, say sample.txt, is provided below.

1 a b c d
2 c e f
3 a b e f g
4 a b c f g
5 d e g
6 a b c e g
7 a b c e
8 a b d e
9 b c e
10 a e g

4. What is the Weighted database?

A weight database is a collection of items with their weights.
A hypothetical weight database, say WFRIWeightSample.txt, containing the items a, b, c, d, e, f, and g as shown below

a 0.60
b 0.50
c 0.35
d 0.45
e 0.45
f 0.3
g 0.4

5. Understanding the statisctics of database

To understand about the database. The below code will give the detail about the transactional database.

The below sample code prints the statistical details of a database.

import PAMI.extras.dbStats.TemporalDatabase as stats

obj = stats.TemporalDatabase('WFRISample.txt', ' ')
obj.run()
obj.printStats() 
Database size : 10
Number of items : 7
Minimum Transaction Size : 3
Average Transaction Size : 4.0
Maximum Transaction Size : 5
Minimum period : 1
Average period : 1.0
Maximum period : 1
Standard Deviation Transaction Size : 0.8944271909999159
Variance : 0.8888888888888888
Sparsity : 0.44285714285714284

The input parameters to a weighted frequent regular pattern mining algorithm are:

5. How to store the output of a weighted frequent regular pattern mining algorithm?

The patterns discovered by a weighted frequent regular pattern mining algorithm can be saved into a file or a data frame.

6. How to run the weighted frequent regular pattern mining algorithms in a terminal?

syntax: python3 algorithmName.py <path to the input file> <path to the output file> <path to the weight file> <minWS> <regularity> <seperator>

7. Sample command to execute the WFRIM code in weightedFrequentRegularPattern folder

Example: python3 WFRIM.py inputFile.txt outputFile.txt weightSample.txt 3 2 ' '

8. How to execute a weighted frequent regular pattern mining algorithm in a Jupyter Notebook?

from PAMI.weightedFrequentRegularPattern.basic import WFRIMiner as alg

iFile = 'WFRISample.txt'  #specify the input transactional database <br>
wFile = 'WFRIWeightSample.txt'  #specify the input transactional database <br>
minWS = 2  #specify the minSupvalue <br>
regularity = 3    #specify the minWeight value <br>
seperator = ' ' #specify the seperator. Default seperator is tab space. <br>
oFile = 'weightedFrequentRegularPatterns.txt'   #specify the output file name<br>

obj = alg.WFRIMiner(iFile, wFile, minWS, regularity, seperator) #initialize the algorithm <br>
obj.startMine()                       #start the mining process <br>
obj.save(oFile)               #store the patterns in file <br>
df = obj.getPatternsAsDataFrame()     #Get the patterns discovered into a dataframe <br>
obj.printResults()                      #Print the stats of mining process
Weighted Frequent Regular patterns were generated successfully using WFRIM algorithm
Total number of  Weighted Frequent Regular Patterns: 9
Total Memory in USS: 99409920
Total Memory in RSS 140251136
Total ExecutionTime in ms: 0.00046825408935546875

The weightedFrequentRegularPatterns.txt file contains the following patterns (format: pattern:support): !cat weightedPatterns.txt

!cat weightedFrequentRegularPatterns.txt
c:[6, 2, 2.0999999999999996] 
c	b:[5, 3, 2.125] 
a:[7, 2, 4.2] 
a	e:[5, 3, 2.625] 
a	e	b:[5, 3, 2.5833333333333335] 
a	b:[7, 2, 3.8500000000000005] 
e:[8, 2, 3.6] 
e	b:[6, 3, 2.8499999999999996] 
b:[8, 2, 4.0] 

The dataframe containing the patterns is shown below:

df
Patterns Support
0 c [6, 2, 2.0999999999999996]
1 c b [5, 3, 2.125]
2 a [7, 2, 4.2]
3 a e [5, 3, 2.625]
4 a e b [5, 3, 2.5833333333333335]
5 a b [7, 2, 3.8500000000000005]
6 e [8, 2, 3.6]
7 e b [6, 3, 2.8499999999999996]
8 b [8, 2, 4.0]