Mining Frequent Patterns With Multiple Minimum Supports in Transactional Databases

PAMI is a Python library containing 100+ algorithms to discover useful patterns in various databases across multiple computing platforms. (Active)

Mining Frequent Patterns With Multiple Minimum Supports in Transactional Databases

1. What is frequent pattern mining with multiple minimum support?

Frequent pattern mining aims to discover all interesting patterns in a transactional database that satisfy the user-specified minimum support (minSup) constraint. The minSup controls the minimum number of transactions that a pattern must cover in a database. Since only a single minSup is employed for the entire database, this technique implicitly assumes that all items in a database have uniform frequencies or similar occurrence behavior. However, this is seldom not the case in many real-world applications. In many applications, some items appear frequently in the data, while others occur rarely. If the frequencies of the items in a database vary a great deal, then finding frequent patterns with a single minSup leads to the following two problems:

This dillema is known as the ‘‘rare item problem.’’

When confronted with the above problem in the real-world applications, researchers tried to tackle it by finding the frequent patterns using multiple minimum support values. In this extended model, each item in the database is specified with a minSup-like constraint known as minimum item support, and the minSup of a pattern is expressed with the lowest minimum support of its items.

References:

2. What is a transactional database?

A transactional database is a collection of transactions, where each transaction contains a transaction-identifier and a set of items.
A hypothetical transactional database containing the items a, b, c, d, e, f, and g as shown below.

tid Transactions
1 a b c g
2 b c d e
3 a b c d
4 a c d f
5 a b c d g
6 c d e f
7 a b c d
8 a e f
9 a b c d
10 b c d e

Note: Duplicate items must not exist in a transaction.

3. What is acceptable format of a transactional databases in PAMI?

Each row in a transactional database must contain only items. The frequent pattern mining algorithms in PAMI implicitly assume the row number of a transaction as its transactional-identifier to reduce storage and processing costs. A sample transactional database, say sampleInputFile.txt, is provided below.

a b c g
b c d e
a b c d
a c d f
a b c d g
c d e f
a b c d
a e f
a b c d
b c d e

4. What is the need for understanding the statisctics of a transactional database?

The performance of a pattern mining algorithm primarily depends on the satistical nature of a database. Thus, it is important to know the following details of a database:

The below sample code prints the statistical details of a database.

import PAMI.extras.dbStats.TransactionalDatabase as stats    
obj = stats.TransactionalDatabase('sampleTransactionalDatabase.txt', ' ') 
obj.run() 
obj.printStats() 

5. What is a multiple minimum support file?

A multiple minimum support file contains a set of items and its respective minimum support value.
A hypothetical multiple minSup database containing the items a, b, c, d, e, f, and g as shown below.

Item MIS
a 4
b 4
c 3
d 4
e 4
f 5
g 5

In many real-world applications, it is often difficult for the users to specify minimum item supports for every item in a database. Different methods have been described in the literature to derive the items’ minimum item support values. The procedures to specify the items’ MIS values is provided at derivingItemsMISValues.

6. What is the acceptable format of multiple minimum support in PAMI?

a 4
b 4
c 3
d 4
e 4
f 5
g 5

7. What are the input parameters to a multiple minimum support based frequent pattern mining algorithm?

The input parameters to a frequent pattern mining algorithm are:

8. How to store the output of a frequent pattern mining with multiple minimum support algorithm?

The patterns discovered by a frequent pattern mining with multiple minimum support algorithm can be saved into a file or a data frame.

Example: python3 CFPGrowth.py inputFile.txt outputFile.txt MIS.txt ' '

9. How to execute a frequent pattern mining with multiple minimum supports algorithm in a Jupyter Notebook?

import PAMI.multipleMinimumSupportBasedFrequentPattern.basic.CFPGrowth as alg 

iFile = 'sampleTransactionalDatabase.txt'     #specify the input transactional database 
mFile = 'MIS.txt'                 #specify the multiple minsup file
seperator = ' '                  #specify the seperator. Default seperator is tab space. 
oFile = 'frequentPatterns.txt'   #specify the output file name

obj = alg.CFPGrowth(iFile, mFile, seperator) #initialize the algorithm 
obj.startMine()                       #start the mining process 
obj.save(oFile)               #store the patterns in file 
df = obj.getPatternsAsDataFrame()     #Get the patterns discovered into a dataframe 
obj.printResults()                      #Print the stats of mining process
Frequent patterns were generated successfully using CFPGrowth algorithm
Total number of  Frequent Patterns: 10
Total Memory in USS: 100147200
Total Memory in RSS 138452992
Total ExecutionTime in ms: 0.0017397403717041016
!cat frequentPatterns.txt
#format: frequentPattern:support
e:4 
e	d	c:3 
e	c:3 
b:5 
b	d:6 
b	d	c:6 
b	c:7 
d:8 
d	c:8 
c:8 

The dataframe containing the patterns is shown below:

df 
Patterns Support
0 e 4
1 e d c 3
2 e c 3
3 b 5
4 b d 6
5 b d c 6
6 b c 7
7 d 8
8 d c 8
9 c 8