PAMI is a Python library containing 100+ algorithms to discover useful patterns in various databases across multiple computing platforms. (Active)
Previous | 🏠 Home | Next |
This page describes the process to create synthetic transactional databases of varying sizes. Please note that this code is different from the widely used synthetic IBM data generator.
A synthetic transactional database can be created by calling generateTransactionalDatabase
class in PAMI.extras.generateDatabase.
import PAMI.extras.generateDatabase.generateTransactionalDatabase as dbGenerator
totalNumberOfItems=500 #total number of items that must exist in a database. Symbol used for this term is I
totalNumberOfTransactions=1000 #Number of transactions that must exist in a database. Symbol used for this term is D
probabilityOfOccurrenceOfAnItem=20 #The probability with which an item must occur in a transaction. The value ranges between 0 to 100. Symbol used for this term is P
outputFile='D1000I500P20.tsv' #Specify the file name. 'D' represents the database size, 'I' represents the total number of items and 'P' represents the probability of occurrence of an item in a database
data = dbGenerator.generateTransactionalDatabase(totalNumberOfTransactions, totalNumberOfItems, probabilityOfOccurrenceOfAnItem, outputFile)