Mining High-Utility Frequent Patterns in Utility Databases

PAMI is a Python library containing 100+ algorithms to discover useful patterns in various databases across multiple computing platforms. (Active)

Mining High-Utility Frequent Patterns in Utility Databases

1. What is High-Utility Frequent pattern mining?

High utility pattern mining model disregards the frequency information of a pattern in a database. However, in many real-world applications, an interestingness of a pattern is determined by both of its value and frequency. In this context, high utility frequent pattern mining was introduced to discover only those interesting patterns that had high value and occurred at least certain number of times in a database.

High utility frequent pattern mining aims to discover all the patterns with utility of pattern is no less than user-specified minimum utility (minutil) and support is no less than user-specified minimum support (minSup).

Reference: R. Uday Kiran, T. Yashwanth Reddy, Philippe Fournier-Viger, Masashi Toyoda, P. Krishna Reddy, Masaru Kitsuregawa: Efficiently Finding High Utility-Frequent Itemsets Using Cutoff and Suffix Utility. PAKDD (2) 2019: 191-203 Link

2. What is the utility database?

A utility database consists of an ‘internal utility database’ and an ‘external utility database’.

In an internal utility database, every transaction contains a set of items and a positive integer called internal utility respectively.

In an external utility database, every transaction contains an item and it external utility value.

An hypothetical internal utility database is shown in below table.

Transactions
(a,2) (b,3) (c,1) (g,1)
(b,3) (c,2) (d,3) (e,2)
(a,2) (b,1) (c,3) (d,4)
(a,3) (c,2) (d,1) (f,2)
(a,3) (b,1) (c,2) (d,1) (g,2)
(c,2) (d,2) (e,3) (f,1)
(a,2) (b,1) (c,1) (d,2)
(a,1) (e,2) (f,2)
(a,2) (b,2) (c,4) (d,2)
(b,3) (c,2) (d,2) (e,2)

A hypothetical external utility database is shown in below table.

Item Profit
a 4
b 3
c 6
d 2
e 5
f 2
g 3

Note: Duplicate items must not exist in a transaction.

3. What is the acceptable format of a utility database in PAMI?

Each row in a utility database must contain the following information:

All of the above three fields have to be seperated using the colan symbol.

A sample utility database, say sampleUtility.txt, is shown below:

a b c g:7:2 3 1 1
b c d e:10:3 2 3 2
a b c d:10:2 1 3 4
a c d f:7:3 2 1 2
a b c d g:9:3 1 2 1 2
c d e f:8:2 2 3 1
a b c d:6:2 1 1 2
a e f:5:1 2 2
a b c d:10:2 2 4 2
b c d e:9:3 2 2 2

4. What is the need for understanding the statistics of a database?

The performance of a pattern mining algorithm primarily depends on the satistical nature of a database. Thus it is important to know the following details of a database:

The sample code is provided below:

import PAMI.extras.dbStats.UtilityDatabase as stats

obj = stats.UtilityDatabase('sampleUtility.txt', ' ')
obj.run()
obj.printStats()

5. What are the input parameters to a high utility frequent pattern mining algorithm?

Algorithms to mine the high-utility patterns requires utility database, minUtil, and minSup (specified by user).

6. How to store the output of a high-utility frequent pattern mining algorithm?

The patterns dicovered by a high utility frequent pattern mining algorithm can be saved into a file or a data frame.

7. How to execute a high-utility frequent pattern algorithm in a terminal?

Example: python3 HUFIM.py inputFile.txt outputFile.txt $20$   $5$   ' '

7. How to exeecute a high utility frequent pattern mining algorithm in a Jupyter Notebook?

import PAMI.highUtilityFrequentPattern.basic.HUFIM as alg

iFile = 'sampleUtility.txt'  # specify the input transactional database 
minUtil = 25  # specify the minUtil value 
minSup = 5  # specify the minSup value 
seperator = ' '  # specify the seperator. Default seperator is tab space. 
oFile = 'utilityfrequentPatterns.txt'  # specify the output file name

obj = alg.HUFIM(iFile, minUtil, minSup, seperator)  # initialize the algorithm 
obj.startMine()  # start the mining process 
obj.save(oFile)  # store the patterns in file 
df = obj.getPatternsAsDataFrame()  # Get the patterns discovered into a dataframe 
obj.printResults()  # Print the stats of mining process
High Utility Frequent patterns were generated successfully using HUFIM algorithm
Total number of High Utility Frequent Patterns: 7
Total Memory in USS: 81223680
Total Memory in RSS 119382016
Total ExecutionTime in seconds: 0.0004372596740722656
!cat utilityfrequentPatterns.txt
# The format of the file is pattern:utility:support
c	d:35:8 
c	d	a:34:5 
c	d	b:39:6 
c	a:27:6 
c	a	b:30:5 
c	b:29:7 
d	b:25:6 
df
#The dataframe containing the patterns is shown below.
Patterns Utility Support
0 c d 35 8
1 c d a 34 5
2 c d b 39 6
3 c a 27 6
4 c a b 30 5
5 c b 29 7
6 d b 25 6