EFIM
- class PAMI.highUtilityPattern.basic.EFIM.EFIM(iFile, minUtil, sep='\t')[source]
Bases:
_utilityPatterns
- Description:
EFIM is one of the fastest algorithm to mine High Utility ItemSets from transactional databases.
- Reference:
Zida, S., Fournier-Viger, P., Lin, J.CW. et al. EFIM: a fast and memory efficient algorithm for high-utility itemset mining. Knowl Inf Syst 51, 595–625 (2017). https://doi.org/10.1007/s10115-016-0986-0
- Parameters:
iFile – str : Name of the Input file to mine complete set of High Utility patterns
oFile – str : Name of the output file to store complete set of High Utility patterns
minUtil – int : The user given minUtil value.
candidateCount – int Number of candidates specified by user
maxMemory – int Maximum memory used by this program for running
sep – str : This variable is used to distinguish items from one another in a transaction. The default seperator is tab space. However, the users can override their default separator.
- Attributes:
- iFilefile
Name of the input file to mine complete set of high utility patterns
- oFilefile
Name of the output file to store complete set of high utility patterns
- memoryRSSfloat
To store the total amount of RSS memory consumed by the program
- startTime:float
To record the start time of the mining process
- endTime:float
To record the completion time of the mining process
- minUtilint
The user given minUtil value
- highUtilityitemSets: map
set of high utility itemSets
- candidateCount: int
Number of candidates
- utilityBinArrayLU: list
A map to hold the local utility values of the items in database
- utilityBinArraySU: list
A map to hold the subtree utility values of the items is database
- oldNamesToNewNames: list
A map which contains old names, new names of items as key value pairs
- newNamesToOldNames: list
A map which contains new names, old names of items as key value pairs
- maxMemory: float
Maximum memory used by this program for running
- patternCount: int
Number of HUI’s
- itemsToKeep: list
keep only the promising items ie items having local utility values greater than or equal to minUtil
- itemsToExplore: list
list of items that have subtreeUtility value greater than or equal to minUtil
:Methods :
- mine()
Mining process will start from here
- getPatterns()
Complete set of patterns will be retrieved with this function
- save(oFile)
Complete set of patterns will be loaded in to a output file
- getPatternsAsDataFrame()
Complete set of patterns will be loaded in to a dataframe
- getMemoryUSS()
Total amount of USS memory consumed by the mining process will be retrieved from this function
- getMemoryRSS()
Total amount of RSS memory consumed by the mining process will be retrieved from this function
- getRuntime()
Total amount of runtime taken by the mining process will be retrieved from this function
- backTrackingEFIM(transactionsOfP, itemsToKeep, itemsToExplore, prefixLength)
A method to mine the HUIs Recursively
- useUtilityBinArraysToCalculateUpperBounds(transactionsPe, j, itemsToKeep)
A method to calculate the sub-tree utility and local utility of all items that can extend itemSet P and e
- output(tempPosition, utility)
A method to output a high-utility itemSet to file or memory depending on what the user chose
- is_equal(transaction1, transaction2)
A method to Check if two transaction are identical
- useUtilityBinArrayToCalculateSubtreeUtilityFirstTime(dataset)
A method to calculate the sub tree utility values for single items
- sortDatabase(self, transactions)
A Method to sort transaction
- sort_transaction(self, trans1, trans2)
A Method to sort transaction
- useUtilityBinArrayToCalculateLocalUtilityFirstTime(self, dataset)
A method to calculate local utility values for single itemsets
Executing the code on terminal:
Format: (.venv) $ python3 EFIM.py <inputFile> <outputFile> <minUtil> <sep> Example Usage: (.venv) $ python3 EFIM sampleTDB.txt output.txt 35
Note
maxMemory will be considered as Maximum memory used by this program for running
Sample run of importing the code:
from PAMI.highUtilityPattern.basic import EFIM as alg obj=alg.EFIM("input.txt",35) obj.mine() Patterns = obj.getPatterns() print("Total number of high utility Patterns:", len(Patterns)) obj.save("output") memUSS = obj.getMemoryUSS() print("Total Memory in USS:", memUSS) memRSS = obj.getMemoryRSS() print("Total Memory in RSS", memRSS) run = obj.getRuntime() print("Total ExecutionTime in seconds:", run)
Credits:
The complete program was written by pradeep pallikila under the supervision of Professor Rage Uday Kiran.
- getMemoryRSS() float [source]
Total amount of RSS memory consumed by the mining process will be retrieved from this function :return: returning RSS memory consumed by the mining process :rtype: float
- getMemoryUSS() float [source]
Total amount of USS memory consumed by the mining process will be retrieved from this function :return: returning USS memory consumed by the mining process :rtype: float
- getPatterns() dict [source]
Function to send the set of patterns after completion of the mining process :return: returning patterns :rtype: dict
- getPatternsAsDataFrame() _pd.DataFrame [source]
Storing final patterns in a dataframe :return: returning patterns in a dataframe :rtype: pd.DataFrame
- getRuntime() float [source]
Calculating the total amount of runtime taken by the mining process :return: returning total amount of runtime taken by the mining process :rtype: float
- save(outFile: str) None [source]
Complete set of frequent patterns will be loaded in to an output file :param outFile: name of the output file :type outFile: csv file :return: None