Mining Fuzzy Frequent Patterns in Fuzzy Transactional Databases

PAMI is a Python library containing 100+ algorithms to discover useful patterns in various databases across multiple computing platforms. (Active)

Mining Fuzzy Frequent Patterns in Fuzzy Transactional Databases

1. What is Fuzzy Frequent pattern mining?

Frequent pattern mining is a renowned data mining technique that aims to discover frequently occurring patterns in a (binary) transactional database. A fundamental limitation of this model is that it fails to discover interesting patterns that may exist in a quantitative (or non-binary) transactional database. When encountered with this limitation in the real-world applications, researchers try to find frequent patterns by converting a quantitative transactional database into a fuzzy transactional database using a set of fuzzy functions. The frequent patterns generated from the fuzzy transactional database are known as fuzzy frequent patterns.

Formally, fuzzy frequent pattern mining aims to discover all patterns that satisfy the user-specified minimum support (minsup) in a fuzzy transactional database. The minSup controls the minimum number of transactions that a pattern must appear in a database.

Reference: Lin, Chun-Wei & Li, Ting & Fournier Viger, Philippe & Hong, Tzung-Pei. (2015). A fast Algorithm for mining fuzzy frequent itemsets. Journal of Intelligent & Fuzzy Systems. 29. 2373-2379. 10.3233/IFS-151936. link

2. What is a Fuzzy transactional database?

A fuzzy transactional database is a collection of transactions, where each transaction contains a set of fuzzy items and their respective fuzzy (or probability) values. Please note that the fuzzy values of a fuzzy item will always lie between (0,1) or (0%, 100%).

Given a quantitative transactional database containing the items, a, b, c, d, e, f and g, and a set of fuzzy membership labels, Low (L), Medium (M), and High (H), a generated hypothetical fuzzy database is shown below.

Transactions
(a.L,0.2) (b.M,0.3) (c.H,0.1) (g.M,0.1)
(b.M,0.3) (c.H,0.2) (d.L,0.3) (e.H,0.2)
(a.L,0.2) (b.M,0.1) (c.H,0.3) (d.L,0.4)
(a.L,0.3) (c.H,0.2) (d.L,0.1) (f.M,0.2)
(a.L,0.3) (b.M,0.1) (c.H,0.2) (d.L,0.1) (g.M,0.2)
(c.H,0.2) (d.L,0.2) (e.H,0.3) (f.M,0.1)
(a.L,0.2) (b.M,0.1) (c.H,0.1) (d.L,0.2)
(a.L,0.1) (e.H,0.2) (f.M,0.2)
(a.L,0.2) (b.M,0.2) (c.H,0.4) (d.L,0.2)
(b.M,0.3) (c.H,0.2) (d.L,0.2) (e.H,0.2)

Note: Duplicate items must not exist in a transaction.

3. What is acceptable format of a fuzzy transactional database in PAMI?

Each row in a fuzzy transactional database must contain list of fuzzy items, colon as a seperator, and their list of fuzzy values.

A sample fuzzy transactional database file, say fuzzyTransactionalDatabase.txt, is provided below:

a.L b.M c.H g.M:0.2 0.3 0.1 0.1
b.M c.H d.L e.H:0.13 0.2 0.3 0.2
a.L b.M c.H d.L:0.2 0.1 0.3 0.4
a.L c.H d.L f.M:0.3 0.2 0.1 0.2
a.L b.M c.H d.L g.M:0.3 0.1 0.2 0.1 0.2
c.H d.L e.H f.M:0.2 0.2 0.3 0.1
a.L b.M c.H d.L:0.2 0.1 0.1 0.2
a.L e.H f.M:0.1 0.2 0.2
a.L b.M c.H d.H:0.2 0.2 0.4 0.2
b.M c.H d.L e.H:0.3 0.2 0.2 0.2

For more information on how to create a fuzzy transactional database from a quantitative (or utility) transactional database, please refer to the manual utility2FuzzyDB.pdf

4. What is the need for understanding the statistics of a fuzzy transactional database?

The performance of a pattern mining algorithm primarily depends on the satistical nature of a database. Thus, it is important to know the following details of a database:

The below sample code prints the statistical details of a database.

import PAMI.extras.dbStats.FuzzyDatabase as stats

obj = stats.FuzzyDatabase('fuzzyTransactionalDatabase.txt', ' ')
obj.run()
obj.printStats() 

5. What are the input parameters to be specified for a fuzzy frequent pattern mining algorithm?

The input parameters to a fuzzy frequent pattern mining algorithm are:

6. How to store the output of a fuzzy frequent pattern mining algorithm?

The patterns discovered by a fuzzy frequent pattern mining algorithm can be saved into a file or a data frame.

7. How to run a fuzzy frequent pattern mining algorithms in a terminal?

foo@bar: cd PAMI/fuzzyFrequentPatterns/basic
foo@bar: python3 algorithmName.py inputFile outputFile minSup seperator

Example: python3 FFIMiner.py inputFile.txt outputFile.txt 5   ' '

8. How to execute a fuzzy frequent pattern mining algorithm in a Jupyter Notebook?

import PAMI.fuzzyFrequentPattern.basic.FFIMiner as alg

iFile = 'fuzzyTransactionalDatabase.txt'  # specify the input utility database 
minSup = 0.9  # specify the minSupvalue 
seperator = ' '  # specify the seperator. Default seperator is tab space. 
oFile = 'fuzzyPatterns.txt'  # specify the output file name

obj = alg.FFIMiner(iFile, minSup, seperator)  # initialize the algorithm 
obj.mine()  # start the mining process 
obj.save(oFile)  # store the patterns in file 
df = obj.getPatternsAsDataFrame()  # Get the patterns discovered into a dataframe 
obj.printResults()  # Print the stats of mining process
Total number of Fuzzy Frequent Patterns: 7
Total Memory in USS: 99786752
Total Memory in RSS 137846784
Total ExecutionTime in seconds: 0.0011456012725830078
!cat fuzzyPatterns.txt
#format: fuzzyFrequentPattern:support
b.M:1.23 
b.M	c.H:0.9299999999999999 
d.L:1.4999999999999998 
d.L	c.H:1.2 
a.L:1.5 
a.L	c.H:1.0 
c.H:1.9000000000000001 

The dataframe contains the following information:

df
Patterns Support
0 b.M 1.23
1 b.M c.H 0.9299999999999999
2 d.L 1.4999999999999998
3 d.L c.H 1.2
4 a.L 1.5
5 a.L c.H 1.0
6 c.H 1.9000000000000001