PAMI - An Open Source PAttern MIning Python Library

PAMI is a Python library containing 100+ algorithms to discover useful patterns in various databases across multiple computing platforms. (Active)

Previous 🏠 Home Next

Statistical details of a transactional database

The performance of a mining algorithm primarily depends on the following two key factors:

  1. Distribution of items’ frequencies and
  2. Distribution of transaction length

Thus, it is important to know the statistical details of a database. PAMI provides inbuilt classes and functions methods to get the statistical details of a database. In this page, we provide the details of methods to get statistical details from a transactional database.

Executing TransactionalDatabase program

The program is located in PAMI.extras.dbStats folder. Thus, execute the below lines to run the program.

#import the program
import PAMI.extras.dbStats.TransactionalDatabase as tds

#initialize the program
obj = tds.TransactionalDatabase(inputFile)
#obj = tds.TransactionalDatabase(inputFile,sep=',') #override default tab seperator
#execute the program

Once the program is executed, users can call different methods to get the statistical details of a database. We now describe the available methods.


This method returns the total number of transactions in a database.

print(f'Database size : {obj.getDatabaseSize()}')


This method returns the total number of transactions in a database.

print(f'Total number of items : {obj.getTotalNumberOfItems()}')


This method returns the sparsity (i.e., the portion of empty values) of the database.

printf(f'Database sparsity : {obj.getSparsity()}')


This method returns the length of the small transaction in a database. In other words, this function returns the minimum number of items in a transaction.

print(f'Minimum Transaction Size : {obj.getMinimumTransactionLength()}')


This method returns the length of an average transaction in a database. In other words, this function returns the average number of items in a transaction.

print(f'Average Transaction Size : {obj.getAverageTransactionLength()}')


This method returns the varience of the lengths of transactions in database.

 print(f'Variance of Transaction Size :{obj.getVarianceTransactionLength()}')


This method retuns the sparsity of the database.

print(f'Database sparsity :{obj.getSparsity()}')


This method returns the length of the largest transaction in a database. In other words, this function returns the maximum number of items in a transaction.

print(f'Maximum Transaction Size : {obj.getMaximumTransactionLength()}')


This method returns the standard deviation of the lengths of transactions in database.

print(f'Standard Deviation Transaction Size : {obj.getStandardDeviationTransactionLength()}')


This method returns the variance of the lengths of transactions in a database

print(f'Variance in Transaction Sizes : {obj.getVarianceTransactionLength()') 


This method returns the minimum utility of all items in a database.

print(f'Minimum utility : {obj.getMinimumUtility()}')    


This method returns the average utility of all items in a database.

print(f'Average utility : {obj.getAverageUtility()}')    


This method returns the maximum utility of all items in a database.

print(f'Maximum utility : {obj.getMaximumUtility()}')    


This method returns a sorted dictionary of items and their frequencies in the database. The format of this dictionary is {item:frequency} The items in this dictionary are sorted in frequency descending order.

itemFrequencies = obj.getSortedListOfItemFrequencies()


This method returns a sorted dictionary of transaction lengths and their occurrence frequencies in the database. The format of this dictionary is {transactionalLength:frequency}. The transaction lengths in this dictionary are sorted in ascending order of their transactional lengths.

transactionLength = obj.getTransanctionalLengthDistribution()


This method returns the sorted dictionary of items and their sum of utility values in a database. The format of this dictionary is {item:sumOfItsUtilities}.

 utility = obj.getSortedUtilityValuesOfItem()

save(dictionary, returnFileName)

This method stores the dictionary in a file. In the output file, the key value pairs of the dictionary are separated by a tab space., 'itemFrequency.csv'), 'transactionSize.csv'), 'utility.csv')  

Sample code

import PAMI.extras.dbStats.UtilityDatabase as uds
obj = uds.UtilityDatabase(inputFile)
#obj = uds.UtilityDatabase(inputFile,sep=',') #override default tab separator
print(f'Database size : {obj.getDatabaseSize()}')
print(f'Total number of items : {obj.getTotalNumberOfItems()}')
printf(f'Database sparsity : {obj.getSparsity()}')
print(f'Minimum Transaction Size : {obj.getMinimumTransactionLength()}')
print(f'Average Transaction Size : {obj.getAverageTransactionLength()}')
print(f'Maximum Transaction Size : {obj.getMaximumTransactionLength()}')
print(f'Standard Deviation Transaction Size : {obj.getStandardDeviationTransactionLength()}')
print(f'Variance in Transaction Sizes : {obj. getVarianceTransactionLength()')
print(f'Total utility : {obj.getTotalUtility()}')
print(f'Minimum utility : {obj.getMinimumUtility()}')
print(f'Average utility : {obj.getAverageUtility()}')
print(f'Maximum utility : {obj.getMaximumUtility()}')
itemFrequencies = obj.getSortedListOfItemFrequencies()
transactionLength = obj.getTransanctionalLengthDistribution()
utility = obj.getSortedUtilityValuesOfItem(), 'itemFrequency.csv'), 'transactionSize.csv'), 'utility.csv')