PAMI is a Python library containing 100+ algorithms to discover useful patterns in various databases across multiple computing platforms. (Active)
Frequent pattern mining aims to discover all interesting patterns in a transactional database that have support no less than the user-specified minimum support (minSup) constraint. The minSup controls the minimum number of transactions in which a pattern must appear in a database.
Reference: Rakesh Agrawal, Tomasz Imieliński, and Arun Swami. 1993. Mining association rules between sets of items in large databases. In Proceedings of the 1993 ACM SIGMOD international conference on Management of data (SIGMOD ‘93). Association for Computing Machinery, New York, NY, USA, 207–216. link
A transactional database is an unordered collection of transactions. A transaction represents a pair constituting of transaction-identifier and a set of items.
A hypothetical transactional database containing the items a, b, c, d, e, f, and g is shown below
tid | Transactions |
---|---|
1 | a b c g |
2 | b c d e |
3 | a b c d |
4 | a c d f |
5 | a b c d g |
6 | c d e f |
7 | a b c d |
8 | a e f |
9 | a b c d |
10 | b c d e |
Note: Duplicate items must not exist within a transaction.
Each row in a transactional database must contain only items. The frequent pattern mining algorithms in PAMI implicitly assume the row number of a transaction as its transactional-identifier to reduce storage and processing costs.
A sample transactional database, say sampleTransactionalDatabase.txt, is provided below.
a b c g
b c d e
a b c d
a c d f
a b c d g
c d e f
a b c d
a e f
a b c d
b c d e
The performance of a pattern mining algorithm primarily depends on the satistical nature of a database. Thus, it is important to know the following details of a database:
The below sample code prints the statistical details of a database.
import PAMI.extras.dbStats.TransactionalDatabase as stats
obj = stats.TransactionalDatabase('sampleTransactionalDatabase.txt', ' ')
obj.run()
obj.printStats()
The input parameters to a frequent pattern mining algorithm are:
- String : E.g., ‘transactionalDatabase.txt’
- URL : E.g., https://u-aizu.ac.jp/~udayrage/datasets/transactionalDatabases/transactional_T10I4D100K.csv
- DataFrame with the header titled ‘Transactions’
- count (beween 0 to length of a database) or
- [0, 1]
The patterns discovered by a frequent pattern mining algorithm can be saved into a file or a data frame.
foo@bar: cd PAMI/frequentPattern/basic
foo@bar:python3 algorithmName.py inputFile outputFile minSup seperator
Example: python3 Apriori.py
inputFile.txt
outputFile.txt
3
' '
import PAMI.frequentPattern.basic.Apriori as alg
iFile = 'sampleTransactionalDatabase.txt' #specify the input transactional database
minSup = 5 #specify the minSup value
seperator = ' ' #specify the seperator. Default seperator is tab space.
oFile = 'frequentPatterns.txt' #specify the output file name
obj = alg.Apriori(iFile, minSup, seperator) #initialize the algorithm
obj.mine() #start the mining process
obj.save(oFile) #store the patterns in file
df = obj.getPatternsAsDataFrame() #Get the patterns discovered into a dataframe
obj.printResults() #Print the stats of mining process
Frequent patterns were generated successfully using Apriori algorithm
Total number of Frequent Patterns: 13
Total Memory in USS: 81133568
Total Memory in RSS 119091200
Total ExecutionTime in ms: 0.00026297569274902344
!cat frequentPatterns.txt
#format: frequentPattern:support
a:7
b:7
c:9
d:8
b a:5
c a:6
c b:7
b d:6
c d:8
d a:5
c b a:5
c b d:6
c d a:5
The dataframe containing the patterns is shown below:
df #The dataframe containing the patterns is shown below. In each pattern, items were seperated from each other with a tab space (or \t).
Patterns | Support | |
---|---|---|
0 | a | 7 |
1 | b | 7 |
2 | c | 9 |
3 | d | 8 |
4 | b a | 5 |
5 | c a | 6 |
6 | c b | 7 |
7 | b d | 6 |
8 | c d | 8 |
9 | d a | 5 |
10 | c b a | 5 |
11 | c b d | 6 |
12 | c d a | 5 |