PAMI.highUtilityPattern.parallel package

Submodules

PAMI.highUtilityPattern.parallel.abstract module

PAMI.highUtilityPattern.parallel.efimparallel module

class PAMI.highUtilityPattern.parallel.efimparallel.efimParallel(iFile, minUtil, sep='\t', threads=1)[source]

Bases: _utilityPatterns

Description:

EFIM is one of the fastest algorithm to mine High Utility ItemSets from transactional databases.

Reference:

Zida, S., Fournier-Viger, P., Lin, J.CW. et al. EFIM: a fast and memory efficient algorithm for high-utility itemset mining. Knowl Inf Syst 51, 595–625 (2017). https://doi.org/10.1007/s10115-016-0986-0

Parameters:
  • iFile – str : Name of the Input file to mine complete set of High Utility patterns

  • oFile – str : Name of the output file to store complete set of High Utility patterns

  • minUtil – int : The user given minUtil value.

  • maxMemory – int Maximum memory used by this program for running

  • sep – str : This variable is used to distinguish items from one another in a transaction. The default seperator is tab space. However, the users can override their default separator.

Attributes:
inputFile (str):

The input file path.

minUtil (int):

The minimum utility threshold.

sep (str):

The separator used in the input file.

threads (int):

The number of threads to use.

Patterns (dict):

A dictionary containing the discovered patterns.

rename (dict):

A dictionary containing the mapping between the item IDs and their names.

runtime (float):

The runtime of the algorithm in seconds.

memoryRSS (int):

The Resident Set Size (RSS) memory usage of the algorithm in bytes.

memoryUSS (int):

The Unique Set Size (USS) memory usage of the algorithm in bytes.

Methods:
read_file():

Read the input file and return the filtered transactions, primary items, and secondary items.

binarySearch(arr, item):

Perform a binary search on the given array to find the given item.

project(beta, file_data, secondary):

Project the given beta itemset on the given database.

search(collections):

Search for high utility itemsets in the given collections.

mine():

Start the EFIM algorithm.

savePatterns(outputFile):

Save the patterns discovered by the algorithm to an output file.

getPatterns():

Get the patterns discovered by the algorithm.

getRuntime():

Get the runtime of the algorithm.

getMemoryRSS():

Get the Resident Set Size (RSS) memory usage of the algorithm.

getMemoryUSS():

Get the Unique Set Size (USS) memory usage of the algorithm.

printResults():

Print the results of the algorithm.

getMemoryRSS()[source]

Get the Resident Set Size (RSS) memory usage of the algorithm.

Returns:

The RSS memory usage in bytes.

Return type:

int

getMemoryUSS()[source]

Get the Unique Set Size (USS) memory usage of the algorithm.

Returns:

The USS memory usage in bytes.

Return type:

int

getPatterns()[source]

Get the patterns discovered by the algorithm.

Returns:

A dictionary containing the discovered patterns.

Return type:

dict

getPatternsAsDataFrame()[source]

Storing final patterns in a dataframe :return: returning patterns in a dataframe :rtype: pd.DataFrame

getRuntime()[source]

Get the runtime of the algorithm.

Returns:

The runtime in seconds.

Return type:

float

mine()[source]

Start the EFIM algorithm.

printResults()[source]

This function is used to print the results

save(outFile)[source]

Complete set of frequent patterns will be loaded in to an output file :param outFile: name of the output file :type outFile: csv file

startMine()[source]

Start the EFIM algorithm.

Module contents