Mining fuzzy correlated patterns in transactional database

PAMI is a Python library containing 100+ algorithms to discover useful patterns in various databases across multiple computing platforms. (Active)

Mining fuzzy correlated patterns in transactional database

What is fuzzy correlated pattern mining?

Correlated pattern mining is a crucial knowledge discovery technique in big data analytics. Since the rationle of this technique is to find all interesting patterns that may exist in a binary transactional database, it fails to discover interesting patterns that may exist in a quantitative transactional database. To tackle this problem, fuzzy correlated pattern mining was introduced to discover regularities in a quantitative transactional database.

In the fuzzy correlated pattern mining, a quantiative transactional database is first transformed into a fuzzy transactional database using a set of fuzzy functions. Later, interesting patterns, called fuzzy correlated patterns, were discovered from the fuzzy transactional database using minimum support and minimum all-confidence constraints.

Reference: Lin, N.P., & Chueh, H. (2007). Fuzzy correlation rules mining. https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.416.6053&rep=rep1&type=pdf

What is a fuzzy transactional database?

A fuzzy transactional database is a collection of transaction, where each transaction contains a set items (or fuzzy terms) and their fuzzy values.
A hypothetical fuzzy database with items a, b, c, d, e, f and g is shown below.

Transactions
(a,2) (b,3) (c,1) (g,1)
(b,3) (c,2) (d,3) (e,2)
(a,2) (b,1) (c,3) (d,4)
(a,3) (c,2) (d,1) (f,2)
(a,3) (b,1) (c,2) (d,1) (g,2)
(c,2) (d,2) (e,3) (f,1)
(a,2) (b,1) (c,1) (d,2)
(a,1) (e,2) (f,2)
(a,2) (b,2) (c,4) (d,2)
(b,3) (c,2) (d,2) (e,2)

Note: Duplicate items must not exist within a transaction.

What is the acceptable format of a fuzzy transactional database in PAMI?

Each row in a fuzzy transactional database must contain fuzzy items, a seperator, and their fuzzy values.

A sample fuzzy transactional database file, say fuzzyTransactionalDatabase.txt, is provided below:

a.L b.M c.H g.M:0.2 0.3 0.1 0.1
b.M c.H d.L e.H:0.13 0.2 0.3 0.2
a.L b.M c.H d.L:0.2 0.1 0.3 0.4
a.L c.H d.L f.M:0.3 0.2 0.1 0.2
a.L b.M c.H d.L g.M:0.3 0.1 0.2 0.1 0.2
c.H d.L e.H f.M:0.2 0.2 0.3 0.1
a.L b.M c.H d.L:0.2 0.1 0.1 0.2
a.L e.H f.M:0.1 0.2 0.2
a.L b.M c.H d.H:0.2 0.2 0.4 0.2
b.M c.H d.L e.H:0.3 0.2 0.2 0.2

For more information on how to create a fuzzy transactional database from a quantitative (or utility) transactional database, please refer to the manual utility2FuzzyDB.pdf

Understanding the statistics of a transactional database

The performance of a pattern mining algorithm primarily depends on the satistical nature of a database. Thus it is important to know the following details of a database:

The below sample code prints the statistical details of a database.

import PAMI.extras.dbStats.FuzzyDatabase as stats

obj = stats.FuzzyDatabase('fuzzyTransactionalDatabase.txt', ' ')
obj.run()
obj.printStats() 

The input parameters to a frequent pattern mining algorithm are:

How to store the output of a fuzzy correlated pattern mining algorithm?

The patterns discovered by a fuzzy correlated pattern mining algorithm can be saved into a file or a data frame.

How to run the fuzzy correlated pattern mining algorithms in a terminal?

syntax: python3 algorithmName.py <path to the input file> <path to the output file> <minSup> <minAllConf> <seperator>

Example: python3 FCPGrowth.py inputFile.txt outputFile.txt 4 0.5 ' '

How to execute a fuzzy correlated pattern mining algorithm in a Jupyter Notebook?

import PAMI.fuzzyCorrelatedPattern.basic.FCPGrowth as alg 

iFile = 'sampleUtility.txt'  #specify the input temporal database <br>
minSup = 4  #specify the minSupvalue <br>    #specify the maxPerAllConfValue <br>
seperator = ' ' #specify the seperator. Default seperator is tab space. <br>
minAllConf = 0.5
oFile = 'FuzzyCorrelatedPatterns.txt'   #specify the output file name<br>

obj = alg.FCPGrowth(iFile, minSup, minAllConf, seperator) #initialize the algorithm <br>
obj.startMine()                       #start the mining process <br>
obj.save(oFile)               #store the patterns in file <br>
df = obj.getPatternsAsDataFrame()     #Get the patterns discovered into a dataframe <br>
obj.printResults()                      #Print the stats of mining process

The FuzzyCorrelatedPatterns.txt file contains the following patterns (format: pattern:support:lability):!cat FuzzyCorrelatedPatterns.txt

!cat FuzzyCorrelatedPatterns.txt
a.L : 5.3999999999999995 : 0.7714285714285714
 
b.L : 5.599999999999999 : 0.6222222222222221
 
b.L c.L : 4.6 : 0.5111111111111111
 
d.L : 6.199999999999999 : 0.6888888888888888
 
d.L c.L : 5.3999999999999995 : 0.6
 
c.L : 6.999999999999999 : 0.7777777777777777

The dataframe containing the patterns is shown below:

df
Patterns Support
0 a.L 5.3999999999999995 : 0.7714285714285714\n
1 b.L 5.599999999999999 : 0.6222222222222221\n
2 b.L c.L 4.6 : 0.5111111111111111\n
3 d.L 6.199999999999999 : 0.6888888888888888\n
4 d.L c.L 5.3999999999999995 : 0.6\n
5 c.L 6.999999999999999 : 0.7777777777777777\n