PAMI is a Python library containing 100+ algorithms to discover useful patterns in various databases across multiple computing platforms. (Active)
Previous | 🏠 Home | Next |
A sparse dataframe is basically a (non-sparse) matrix in which the first column represents the row-identifier/timestamp, the second column represents the item, and the third column represents the value of the corresponding item. The format of the sparse dataframe is as follows:
rowIdentifier/timestamp Item1 Value
An example of a dense dataframe generated from the customer purchase database is as follows:
timestamp | Item | Value |
---|---|---|
1 | Bread | 3 |
1 | Jam | 1 |
1 | Butter | 2 |
2 | Bread | 7 |
2 | Jam | 2 |
… | … | … |
Currently, PAMI supports converting a dataframe into a transactional database, temporal database, ond a utility database.
The users can avail this support by employing the methods available in dataPreprocessign.SparseFormatDF class.
We now present these three methods.
A transactional database represents a sparse and binary representation of items occurring in a dataframe. The steps to convert a dataframe into a transactional database is as follows:
A sample program to convert a dataframe into a transactional database and use it in a pattern mining algorithm, say FP-growth, is provided below
from PAMI.extras.DF2DB import SparseFormatDF as pro
from PAMI.frequentPattern.basic import FPGrowth as alg
import pandas as pd
# Objective: convert the above dataframe into a transactional database with items whose value is greater than or equal 1.
db = pro.SparseFormatDF(inputDataFrame=pd.DataFrame('mentionDataFrame'), thresholdValue=1, condition='>=')
# Convert and store the dataframe as a transactional database file
db.createTransactional(outputFile='/home/userName/transactionalDB.txt')
# Getting the fileName of the transactional database
print('The output file is saved at ' + db.getFileName())
# Using the generated transactional database in FP-growth algorithm to discover frequent patterns
obj = alg.fpGrowth(iFile=db.getFileName(), minSup='10.0')
obj.mine()
patternsDF = obj.getPatternsAsDataFrame()
A temporal database represents a sparse and binary representation of items occurring at a particular timestamp in a dataframe. The steps to convert a dataframe into a temporal database is as follows:
A sample program to convert a dataframe into a temporal database and use it in a pattern mining algorithm, say PFP-growth++, is provided below
from PAMI.extras.DF2DB import SparseFormatDF as pro
from PAMI.periodicFrequentPattern.basic import PFPGrowthPlus as alg
import pandas as pd
# Objective: convert the above dataframe into a transactional database with items whose value is greater than or equal 1.
db = pro.SparseFormatDF(inputDataFrame=pd.DataFrame('mentionDataFrame'), thresholdValue=1, condition='>=')
# Convert and store the dataframe as a transactional database file
db.createTransactional(outputFile='/home/userName/temporalDB.txt')
# Getting the fileName of the transactional database
print('The output file is saved at ' + db.getFileName())
obj = alg.PFPGrowthPlus(db.getFileName(), minSup="2", maxPer="6")
obj.mine()
patternsDF = obj.getPatternsAsDataFrame()
A utility database represents a sparse and non-binary representation of items occurring in each row of a dataframe. The steps to convert a dataframe into a utility database is as follows:
A sample program to convert a dataframe into a utility database and use it in a pattern mining algorithm, say EFIM, is provided below
from PAMI.extras.DF2DB import SparseFormatDF as pro
from PAMI.highUtilityPattern.basic import EFIM as alg
import pandas as pd
# Objective: convert the above dataframe into a transactional database with items whose value is greater than or equal 1.
db = pro.SparseFormatDF(inputDataFrame=pd.DataFrame('mentionDataFrame'), thresholdValue=1, condition='>=')
# Convert and store the dataframe as a transactional database file
db.createTransactional(outputFile='/home/userName/utilityDB.txt')
# Getting the fileName of the transactional database
print('The output file is saved at ' + db.getFileName())