PAMI is a Python library containing 100+ algorithms to discover useful patterns in various databases across multiple computing platforms. (Active)
Frequent pattern mining is a renowned data mining technique that aims to discover frequently occurring patterns in a (binary) transactional database. A fundamental limitation of this model is that it fails to discover interesting patterns that may exist in a quantitative (or non-binary) transactional database. When encountered with this limitation in the real-world applications, researchers try to find frequent patterns by converting a quantitative transactional database into a fuzzy transactional database using a set of fuzzy functions. The frequent patterns generated from the fuzzy transactional database are known as fuzzy frequent patterns.
Formally, fuzzy frequent pattern mining aims to discover all patterns that satisfy the user-specified minimum support (minsup) in a fuzzy transactional database. The minSup controls the minimum number of transactions that a pattern must appear in a database.
Reference: Lin, Chun-Wei & Li, Ting & Fournier Viger, Philippe & Hong, Tzung-Pei. (2015). A fast Algorithm for mining fuzzy frequent itemsets. Journal of Intelligent & Fuzzy Systems. 29. 2373-2379. 10.3233/IFS-151936. link
A fuzzy transactional database is a collection of transactions, where each transaction contains a set of fuzzy items and their respective fuzzy (or probability) values. Please note that the fuzzy values of a fuzzy item will always lie between (0,1) or (0%, 100%).
Given a quantitative transactional database containing the items, a, b, c, d, e, f and g, and a set of fuzzy membership labels, Low (L), Medium (M), and High (H), a generated hypothetical fuzzy database is shown below.
| Transactions |
|---|
| (a.L,0.2) (b.M,0.3) (c.H,0.1) (g.M,0.1) |
| (b.M,0.3) (c.H,0.2) (d.L,0.3) (e.H,0.2) |
| (a.L,0.2) (b.M,0.1) (c.H,0.3) (d.L,0.4) |
| (a.L,0.3) (c.H,0.2) (d.L,0.1) (f.M,0.2) |
| (a.L,0.3) (b.M,0.1) (c.H,0.2) (d.L,0.1) (g.M,0.2) |
| (c.H,0.2) (d.L,0.2) (e.H,0.3) (f.M,0.1) |
| (a.L,0.2) (b.M,0.1) (c.H,0.1) (d.L,0.2) |
| (a.L,0.1) (e.H,0.2) (f.M,0.2) |
| (a.L,0.2) (b.M,0.2) (c.H,0.4) (d.L,0.2) |
| (b.M,0.3) (c.H,0.2) (d.L,0.2) (e.H,0.2) |
Note: Duplicate items must not exist in a transaction.
Each row in a fuzzy transactional database must contain list of fuzzy items, colon as a seperator, and their list of fuzzy values.
A sample fuzzy transactional database file, say fuzzyTransactionalDatabase.txt, is provided below:
a.L b.M c.H g.M:0.2 0.3 0.1 0.1
b.M c.H d.L e.H:0.13 0.2 0.3 0.2
a.L b.M c.H d.L:0.2 0.1 0.3 0.4
a.L c.H d.L f.M:0.3 0.2 0.1 0.2
a.L b.M c.H d.L g.M:0.3 0.1 0.2 0.1 0.2
c.H d.L e.H f.M:0.2 0.2 0.3 0.1
a.L b.M c.H d.L:0.2 0.1 0.1 0.2
a.L e.H f.M:0.1 0.2 0.2
a.L b.M c.H d.H:0.2 0.2 0.4 0.2
b.M c.H d.L e.H:0.3 0.2 0.2 0.2
For more information on how to create a fuzzy transactional database from a quantitative (or utility) transactional database, please refer to the manual utility2FuzzyDB.pdf
The performance of a pattern mining algorithm primarily depends on the satistical nature of a database. Thus, it is important to know the following details of a database:
The below sample code prints the statistical details of a database.
import PAMI.extras.dbStats.FuzzyDatabase as stats
obj = stats.FuzzyDatabase('fuzzyTransactionalDatabase.txt', ' ')
obj.run()
obj.printStats()
The input parameters to a fuzzy frequent pattern mining algorithm are:
- String : E.g., ‘fuzzyDatabase.txt’
- URL : E.g., https://u-aizu.ac.jp/~udayrage/datasets/fuzzyDatabases/fuzzy_T10I4D100K.csv
- DataFrame with the header titled ‘Transactions’ and ‘fuzzyValues’
- count (beween 0 to length of a database) or
- [0, 1]
The patterns discovered by a fuzzy frequent pattern mining algorithm can be saved into a file or a data frame.
foo@bar: cd PAMI/fuzzyFrequentPatterns/basic
foo@bar: python3 algorithmName.py inputFile outputFile minSup seperator
Example: python3 FFIMiner.py inputFile.txt outputFile.txt 5 ' '
import PAMI.fuzzyFrequentPattern.basic.FFIMiner as alg
iFile = 'fuzzyTransactionalDatabase.txt' # specify the input utility database
minSup = 0.9 # specify the minSupvalue
seperator = ' ' # specify the seperator. Default seperator is tab space.
oFile = 'fuzzyPatterns.txt' # specify the output file name
obj = alg.FFIMiner(iFile, minSup, seperator) # initialize the algorithm
obj.mine() # start the mining process
obj.save(oFile) # store the patterns in file
df = obj.getPatternsAsDataFrame() # Get the patterns discovered into a dataframe
obj.printResults() # Print the stats of mining process
Total number of Fuzzy Frequent Patterns: 7
Total Memory in USS: 99786752
Total Memory in RSS 137846784
Total ExecutionTime in seconds: 0.0011456012725830078
!cat fuzzyPatterns.txt
#format: fuzzyFrequentPattern:support
b.M:1.23
b.M c.H:0.9299999999999999
d.L:1.4999999999999998
d.L c.H:1.2
a.L:1.5
a.L c.H:1.0
c.H:1.9000000000000001
The dataframe contains the following information:
df
| Patterns | Support | |
|---|---|---|
| 0 | b.M | 1.23 |
| 1 | b.M c.H | 0.9299999999999999 |
| 2 | d.L | 1.4999999999999998 |
| 3 | d.L c.H | 1.2 |
| 4 | a.L | 1.5 |
| 5 | a.L c.H | 1.0 |
| 6 | c.H | 1.9000000000000001 |