PAMI is a Python library containing 100+ algorithms to discover useful patterns in various databases across multiple computing platforms. (Active)
Previous | 🏠Home | Next |
The performance of a mining algorithm primarily depends on the following two key factors:
Thus, it is important to know the statistical details of a database. PAMI provides inbuilt classes and functions methods to get the statistical details of a database. In this page, we provide the details of methods to get statistical details from a transactional database.
The TransactionalDatabase.py program is located in PAMI.extras.dbStats folder. Thus, execute the below lines to run the program.
#import the program
import PAMI.extras.dbStats.TransactionalDatabase as tds
#initialize the program
obj = tds.TransactionalDatabase(inputFile)
#obj = tds.TransactionalDatabase(inputFile,sep=',') #override default tab seperator
#execute the program
obj.run()
Once the program is executed, users can call different methods to get the statistical details of a database. We now describe the available methods.
This method returns the total number of transactions in a database.
print(f'Database size : {obj.getDatabaseSize()}')
This method returns the total number of transactions in a database.
print(f'Total number of items : {obj.getTotalNumberOfItems()}')
####.getSparsity()
This method returns the sparsity (i.e., the portion of empty values) of the database.
printf(f'Database sparsity : {obj.getSparsity()}')
This method returns the length of the small transaction in a database. In other words, this function returns the minimum number of items in a transaction.
print(f'Minimum Transaction Size : {obj.getMinimumTransactionLength()}')
This method returns the length of an average transaction in a database. In other words, this function returns the average number of items in a transaction.
print(f'Average Transaction Size : {obj.getAverageTransactionLength()}')
This method returns the varience of the lengths of transactions in database.
print(f'Variance of Transaction Size :{obj.getVarianceTransactionLength()}')
This method retuns the sparsity of the database.
print(f'Database sparsity :{obj.getSparsity()}')
This method returns the length of the largest transaction in a database. In other words, this function returns the maximum number of items in a transaction.
print(f'Maximum Transaction Size : {obj.getMaximumTransactionLength()}')
This method returns the standard deviation of the lengths of transactions in database.
print(f'Standard Deviation Transaction Size : {obj.getStandardDeviationTransactionLength()}')
This method returns the variance of the lengths of transactions in a database
print(f'Variance in Transaction Sizes : {obj.getVarianceTransactionLength()')
This method returns the minimum utility of all items in a database.
print(f'Minimum utility : {obj.getMinimumUtility()}')
This method returns the average utility of all items in a database.
print(f'Average utility : {obj.getAverageUtility()}')
This method returns the maximum utility of all items in a database.
print(f'Maximum utility : {obj.getMaximumUtility()}')
This method returns a sorted dictionary of items and their frequencies in the database. The format of this dictionary is {item:frequency} The items in this dictionary are sorted in frequency descending order.
itemFrequencies = obj.getSortedListOfItemFrequencies()
This method returns a sorted dictionary of transaction lengths and their occurrence frequencies in the database. The format of this dictionary is {transactionalLength:frequency}. The transaction lengths in this dictionary are sorted in ascending order of their transactional lengths.
transactionLength = obj.getTransanctionalLengthDistribution()
This method returns the sorted dictionary of items and their sum of utility values in a database. The format of this dictionary is {item:sumOfItsUtilities}.
utility = obj.getSortedUtilityValuesOfItem()
This method stores the dictionary in a file. In the output file, the key value pairs of the dictionary are separated by a tab space.
obj.save(itemFrequencies, 'itemFrequency.csv')
obj.save(transactionLength, 'transactionSize.csv')
obj.save(utility, 'utility.csv')
import PAMI.extras.dbStats.UtilityDatabase as uds
obj = uds.UtilityDatabase(inputFile)
#obj = uds.UtilityDatabase(inputFile,sep=',') #override default tab separator
obj.run()
print(f'Database size : {obj.getDatabaseSize()}')
print(f'Total number of items : {obj.getTotalNumberOfItems()}')
printf(f'Database sparsity : {obj.getSparsity()}')
print(f'Minimum Transaction Size : {obj.getMinimumTransactionLength()}')
print(f'Average Transaction Size : {obj.getAverageTransactionLength()}')
print(f'Maximum Transaction Size : {obj.getMaximumTransactionLength()}')
print(f'Standard Deviation Transaction Size : {obj.getStandardDeviationTransactionLength()}')
print(f'Variance in Transaction Sizes : {obj. getVarianceTransactionLength()')
print(f'Total utility : {obj.getTotalUtility()}')
print(f'Minimum utility : {obj.getMinimumUtility()}')
print(f'Average utility : {obj.getAverageUtility()}')
print(f'Maximum utility : {obj.getMaximumUtility()}')
itemFrequencies = obj.getSortedListOfItemFrequencies()
transactionLength = obj.getTransanctionalLengthDistribution()
utility = obj.getSortedUtilityValuesOfItem()
obj.save(itemFrequencies, 'itemFrequency.csv')
obj.save(transactionLength, 'transactionSize.csv')
obj.save(utility, 'utility.csv')