PAMI is a Python library containing 100+ algorithms to discover useful patterns in various databases across multiple computing platforms. (Active)
Previous | 🏠Home | Next |
The performance of a mining algorithm primarily depends on the following two key factors:
Thus, it is important to know the statistical details of a database. PAMI provides inbuilt classes and functions methods to get the statistical details of a database. In this page, we provide the details of methods to get statistical details from a temporal database.
The TemporalDatabase.py program is located in PAMI.extras.dbStats folder. Thus, execute the below lines to run the program.
# import the program
import PAMI.extras.dbStats.TemporalDatabase as tds
inputFile = "fileName"
# initialize the program
obj = tds.TemporalDatabase(inputFile)
# obj = tds.TemporalDatabase(inputFile, sep=',') #overrride default tab seperator
# execute the program
obj.run()
Once the program is executed, users can call different methods to get the statistical details of a database. We now describe the available methods.
This method returns the total number of transactions in a database.
print(f'Database size : {obj.getDatabaseSize()}')
This method returns the total number of transactions in a database.
print(f'Total number of items : {obj.getTotalNumberOfItems()}')
####.getSparsity()
This method returns the sparsity (i.e., the portion of empty values) of the database.
print(f'Database sparsity : {obj.getSparsity()}')
This method returns the length of the small transaction in a database. In other words, this function returns the minimum number of items in a transaction.
print(f'Minimum Transaction Size : {obj.getMinimumTransactionLength()}')
This method returns the length of an average transaction in a database. In other words, this function returns the average number of items in a transaction.
print(f'Average Transaction Size : {obj.getAverageTransactionLength()}')
This method returns the length of the largest transaction in a database. In other words, this function returns the maximum number of items in a transaction.
print(f'Maximum Transaction Size : {obj.getMaximumTransactionLength()}')
This method returns the standard deviation of the lengths of transactions in database.
print(f'Standard Deviation Transaction Size : {obj.getStandardDeviationTransactionLength()}')
This method returns the variance of the lengths of transactions in a database
print(f'Variance in Transaction Sizes : {obj.getVarianceTransactionLength()')
This method returns the varience of the lengths of transactions in database.
print(f'Variance of Transaction Size :{obj.getVarianceTransactionLength()}')
This method returns the sparsity of the database.
print(f'Database sparsity :{obj.getSparsity()}')
This method returns the minimum period between two transactions in a database.
print(f'Minimum period : {obj.getMinimumPeriod()}')
This method returns the average period between two transactions in a database.
print(f'Average period : {obj.getAveragePeriod()}')
This method returns the maximum period between two transactions in a database.
print(f'Maximum period : {obj.getMaximumPeriod()}')
This method returns a sorted dictionary of items and their frequencies in the database. The format of this dictionary is {item:frequency} The items in this dictionary are sorted in frequency descending order.
itemFrequencies = obj.getSortedListOfItemFrequencies()
This method returns a sorted dictionary of transaction lengths and their occurrence frequencies in the database. The format of this dictionary is {temporalLength:frequency}. The transaction lengths in this dictionary are sorted in ascending order of their temporal lengths.
transactionLength = obj.getTransanctionalLengthDistribution()
This method returns a sorted dictionary of timestamps and the number of transactions occurring at the corresponding timestamp. The format of this dictionary is {timestamp:frequency}
numberOfTransactionPerTimeStamp = obj.getNumberOfTransactionsPerTimestamp()
This method stores the dictionary in a file. In the output file, the key value pairs of the dictionary are separated by a tab space.
obj.save(itemFrequencies, 'itemFrequency.csv')
obj.save(transactionLength, 'transactionSize.csv')
obj.save(numberOfTransactionPerTimeStamp, 'numberOfTransaction.csv')
```Python import PAMI.extras.dbStats.TemporalDatabase as tds
obj = tds.TemporalDatabase(inputFile)
# obj = tds.TemporalDatabase(inputFile, sep=',') #overrride default tab seperator
obj.run()
print(f'Database size : {obj.getDatabaseSize()}')
print(f'Total number of items : {obj.getTotalNumberOfItems()}')
print(f'Database sparsity : {obj.getSparsity()}')
print(f'Minimum Transaction Size : {obj.getMinimumTransactionLength()}')
print(f'Average Transaction Size : {obj.getAverageTransactionLength()}')
print(f'Maximum Transaction Size : {obj.getMaximumTransactionLength()}')
print(f'Standard Deviation Transaction Size : {obj.getStandardDeviationTransactionLength()}')
print(f'Variance in Transaction Sizes : {obj. getVarianceTransactionLength()}')
print(f'Minimum period : {obj.getMinimumPeriod()}')
print(f'Average period : {obj.getAveragePeriod()}')
print(f'Maximum period : {obj.getMaximumPeriod()}')
itemFrequencies = obj.getSortedListOfItemFrequencies()
transactionLength = obj.getTransanctionalLengthDistribution()
numberOfTransactionPerTimeStamp = obj.getNumberOfTransactionsPerTimestamp()
obj.save(itemFrequencies,'itemFrequency.csv')
obj.save(transactionLength, 'transactionSize.csv')
obj.save(numberOfTransactionPerTimeStamp, 'numberOfTransaction.csv') ```Python