PAMI is a Python library containing 100+ algorithms to discover useful patterns in various databases across multiple computing platforms. (Active)
Frequent pattern mining aims to discover all interesting patterns in a transactional database that satisfy the user-specified minimum support (minSup) constraint. The minSup controls the minimum number of transactions that a pattern must cover in a database. Since only a single minSup is employed for the entire database, this technique implicitly assumes that all items in a database have uniform frequencies or similar occurrence behavior. However, this is seldom not the case in many real-world applications. In many applications, some items appear frequently in the data, while others occur rarely. If the frequencies of the items in a database vary a great deal, then finding frequent patterns with a single minSup leads to the following two problems:
This dillema is known as the ‘‘rare item problem.’’
When confronted with the above problem in the real-world applications, researchers tried to tackle it by finding frequent patterns using other measures, such as relative support. According to this extended model of frequent pattern, a pattern is said to be frequent if its support no less than the user-specified minimum support (minSup) constraint and relative support no less than the user-specified minimum relative support (minRSup). The minSup controls the minimum number of transactions that a pattern must appear in a database. minRSup is the conditional minimum support calculated as ratio of pattern support to the minimum support of all items in pattern.
References :
H. Yun, D. Ha, B. Hwang, and K. H. Ryu. Mining association rules on significant rare data using relative support. J. Syst. Softw., 67:181–191, September 2003. https://dl.acm.org/doi/10.1016/S0164-1212%2802%2900128-0
R. Uday Kiran and Masaru Kitsuregawa. 2012. Towards efficient discovery of frequent patterns with relative support. In Proceedings of the 18th International Conference on Management of Data (COMAD ‘12). Computer Society of India, Mumbai, Maharashtra, IND, 92–99. https://dl.acm.org/doi/10.5555/2694443.2694460
A transactional database is a collection of transactions, where each transaction contains a transaction-identifier and a set of items.
A hypothetical transactional database containing the items a, b, c, d, e, f, and g as shown below
tid | Transactions |
---|---|
1 | a b c g |
2 | b c d e |
3 | a b c d |
4 | a c d f |
5 | a b c d g |
6 | c d e f |
7 | a b c d |
8 | a e f |
9 | a b c d |
10 | b c d e |
Note: Duplicate items must not exist in a transaction.
Each row in a transactional database must contain only items. The frequent pattern mining algorithms in PAMI implicitly assume the row number of a transaction as its transactional-identifier to reduce storage and processing costs. A sample transactional database, say sampleInputFile.txt, is provided below.
a b c g
b c d e
a b c d
a c d f
a b c d g
c d e f
a b c d
a e f
a b c d
b c d e
The performance of a pattern mining algorithm primarily depends on the satistical nature of a database. Thus, it is important to know the following details of a database:
The below sample code prints the statistical details of a database.
import PAMI.extras.dbStats.TransactionalDatabase as stats
obj = stats.TransactionalDatabase('sampleTransactionalDatabase.txt', ' ')
obj.run()
obj.printStats()
Algorithms to mine the frequent patterns with relative support requires transactional database and minSup (specified by user).
- String : E.g., ‘transactionalDatabase.txt’
- URL : E.g., https://u-aizu.ac.jp/~udayrage/datasets/transactionalDatabases/transactional_T10I4D100K.csv
- DataFrame. Please note that dataframe must contain the header titled ‘Transactions’
- count (beween 0 to length of a database) or
- [0, 1]
- [0, 1]
The patterns discovered by a relative frequent pattern mining algorithm can be saved into a file or a data frame.
foo@bar: cd PAMI/relativeFrequentPattern/basic
foo@bar: python3 algorithmName.py inputFile outputFile minSup minRatio seperator
Example: python3 RSFPGrowth.py
inputFile.txt
outputFile.txt
4
0.7
' '
import PAMI.relativeFrequentPattern.basic.RSFPGrowth as alg
iFile = 'sampleTransactionalDatabase.txt' # specify the input transactional database
minSup = 5 # specify the minSupvalue
minRS = 0.6 # specify the minimum relative support
seperator = ' ' # specify the seperator. Default seperator is tab space.
oFile = 'frequentPatterns.txt' # specify the output file name
obj = alg.RSFPGrowth(iFile, minSup, minRS, seperator) # initialize the algorithm
obj.mine() # start the mining process
obj.save(oFile) # store the patterns in file
df = obj.getPatternsAsDataFrame() # Get the patterns discovered into a dataframe
obj.printResults() # Print the results of mining process
Relative support frequent patterns were generated successfully using RSFPGrowth algorithm
Total number of Relative Frequent Patterns: 15
Total Memory in USS: 98238464
Total Memory in RSS 135536640
Total ExecutionTime in ms: 0.0007281303405761719
!cat frequentPatterns.txt
#format: relativeFrequentPattern:support:relativeRatio
d: 8 : 1.0
d c: 16 : 2.0
c: 9 : 1.0
b: 7 : 1.0
b d: 12 : 1.7142857142857142
b d c: 12 : 1.7142857142857142
b c: 14 : 2.0
b a: 10 : 1.4285714285714286
b a c: 8 : 1.1428571428571428
b a d: 8 : 1.1428571428571428
b a c d: 8 : 1.1428571428571428
a: 7 : 1.0
a c: 10 : 1.4285714285714286
a d: 10 : 1.4285714285714286
a c d: 10 : 1.4285714285714286
The dataframe contains the following information:
df
#The dataframe containing the patterns is shown below. In each pattern, items were seperated from each other with a tab space (or \t).
Patterns | Support | |
---|---|---|
0 | d | 8 : 1.0 |
1 | d c | 16 : 2.0 |
2 | c | 9 : 1.0 |
3 | b | 7 : 1.0 |
4 | b d | 12 : 1.7142857142857142 |
5 | b d c | 12 : 1.7142857142857142 |
6 | b c | 14 : 2.0 |
7 | b a | 10 : 1.4285714285714286 |
8 | b a c | 8 : 1.1428571428571428 |
9 | b a d | 8 : 1.1428571428571428 |
10 | b a c d | 8 : 1.1428571428571428 |
11 | a | 7 : 1.0 |
12 | a c | 10 : 1.4285714285714286 |
13 | a d | 10 : 1.4285714285714286 |
14 | a c d | 10 : 1.4285714285714286 |