PAMI is a Python library containing 100+ algorithms to discover useful patterns in various databases across multiple computing platforms. (Active)
Correlated pattern mining is a crucial knowledge discovery technique in big data analytics. Since the rationle of this technique is to find all interesting patterns that may exist in a binary transactional database, it fails to discover interesting patterns that may exist in a quantitative transactional database. To tackle this problem, fuzzy correlated pattern mining was introduced to discover regularities in a quantitative transactional database.
In the fuzzy correlated pattern mining, a quantiative transactional database is first transformed into a fuzzy transactional database using a set of fuzzy functions. Later, interesting patterns, called fuzzy correlated patterns, were discovered from the fuzzy transactional database using minimum support and minimum all-confidence constraints.
Reference: Lin, N.P., & Chueh, H. (2007). Fuzzy correlation rules mining. https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.416.6053&rep=rep1&type=pdf
A fuzzy transactional database is a collection of transaction, where each transaction contains a set items (or fuzzy terms) and their fuzzy values.
A hypothetical fuzzy database with items a, b, c, d, e, f and g is shown below.
Transactions |
---|
(a,2) (b,3) (c,1) (g,1) |
(b,3) (c,2) (d,3) (e,2) |
(a,2) (b,1) (c,3) (d,4) |
(a,3) (c,2) (d,1) (f,2) |
(a,3) (b,1) (c,2) (d,1) (g,2) |
(c,2) (d,2) (e,3) (f,1) |
(a,2) (b,1) (c,1) (d,2) |
(a,1) (e,2) (f,2) |
(a,2) (b,2) (c,4) (d,2) |
(b,3) (c,2) (d,2) (e,2) |
Note: Duplicate items must not exist within a transaction.
Each row in a fuzzy transactional database must contain fuzzy items, a seperator, and their fuzzy values.
A sample fuzzy transactional database file, say fuzzyTransactionalDatabase.txt, is provided below:
a.L b.M c.H g.M:0.2 0.3 0.1 0.1
b.M c.H d.L e.H:0.13 0.2 0.3 0.2
a.L b.M c.H d.L:0.2 0.1 0.3 0.4
a.L c.H d.L f.M:0.3 0.2 0.1 0.2
a.L b.M c.H d.L g.M:0.3 0.1 0.2 0.1 0.2
c.H d.L e.H f.M:0.2 0.2 0.3 0.1
a.L b.M c.H d.L:0.2 0.1 0.1 0.2
a.L e.H f.M:0.1 0.2 0.2
a.L b.M c.H d.H:0.2 0.2 0.4 0.2
b.M c.H d.L e.H:0.3 0.2 0.2 0.2
For more information on how to create a fuzzy transactional database from a quantitative (or utility) transactional database, please refer to the manual utility2FuzzyDB.pdf
The performance of a pattern mining algorithm primarily depends on the satistical nature of a database. Thus it is important to know the following details of a database:
The below sample code prints the statistical details of a database.
import PAMI.extras.dbStats.FuzzyDatabase as stats
obj = stats.FuzzyDatabase('fuzzyTransactionalDatabase.txt', ' ')
obj.run()
obj.printStats()
The input parameters to a frequent pattern mining algorithm are:
- String : E.g., ‘FuzzyDatabase.txt’
- URL : E.g., https://u-aizu.ac.jp/~udayrage/datasets/fuzzyDatabases/fuzzy_T10I4D100K.csv
- DataFrame with the header titled
- count (beween 0 to length of a database) or
- [0, 1]
- [0, 1]
The patterns discovered by a fuzzy correlated pattern mining algorithm can be saved into a file or a data frame.
syntax: python3 algorithmName.py <path to the input file>
<path to the output file>
<minSup>
<minAllConf>
<seperator>
Example: python3 FCPGrowth.py
inputFile.txt
outputFile.txt
4
0.5
' '
import PAMI.fuzzyCorrelatedPattern.basic.FCPGrowth as alg
iFile = 'sampleUtility.txt' #specify the input temporal database <br>
minSup = 4 #specify the minSupvalue <br> #specify the maxPerAllConfValue <br>
seperator = ' ' #specify the seperator. Default seperator is tab space. <br>
minAllConf = 0.5
oFile = 'FuzzyCorrelatedPatterns.txt' #specify the output file name<br>
obj = alg.FCPGrowth(iFile, minSup, minAllConf, seperator) #initialize the algorithm <br>
obj.mine() #start the mining process <br>
obj.save(oFile) #store the patterns in file <br>
df = obj.getPatternsAsDataFrame() #Get the patterns discovered into a dataframe <br>
obj.printResults() #Print the stats of mining process
The FuzzyCorrelatedPatterns.txt file contains the following patterns (format: pattern:support:lability):!cat FuzzyCorrelatedPatterns.txt
!cat FuzzyCorrelatedPatterns.txt
a.L : 5.3999999999999995 : 0.7714285714285714
b.L : 5.599999999999999 : 0.6222222222222221
b.L c.L : 4.6 : 0.5111111111111111
d.L : 6.199999999999999 : 0.6888888888888888
d.L c.L : 5.3999999999999995 : 0.6
c.L : 6.999999999999999 : 0.7777777777777777
The dataframe containing the patterns is shown below:
df
Patterns | Support | |
---|---|---|
0 | a.L | 5.3999999999999995 : 0.7714285714285714\n |
1 | b.L | 5.599999999999999 : 0.6222222222222221\n |
2 | b.L c.L | 4.6 : 0.5111111111111111\n |
3 | d.L | 6.199999999999999 : 0.6888888888888888\n |
4 | d.L c.L | 5.3999999999999995 : 0.6\n |
5 | c.L | 6.999999999999999 : 0.7777777777777777\n |