PAMI is a Python library containing 100+ algorithms to discover useful patterns in various databases across multiple computing platforms. (Active)
weighted Frequent neighbourhood pattern mining aims to discover all interesting patterns in a transactional database that have weighted sum no less than the user-specified minimum weighted sum (minWS) constraint, and dist no greater than the user-specified maximum distance (maxDist). The minWS controls the minimum number of transactions that a pattern must appear in a database. The minWeight controls the minimum weight of item. The maxDist controls the maximum distance between two items.
Reference: R. U. Kiran, P. P. C. Reddy, K. Zettsu, M. Toyoda, M. Kitsuregawa and P. K. Reddy, “Efficient Discovery of Weighted Frequent Neighborhood Itemsets in Very Large Spatiotemporal Databases,” in IEEE Access, vol. 8, pp. 27584-27596, 2020, doi: 10.1109/ACCESS.2020.2970181.
A transactional database is a collection of transactions, where each transaction contains a transaction-identifier and a set of items and its respective weights.
A hypothetical transactional database containing the items a, b, c, d, e, f, and g as shown below
| tid | Transactions | Weights |
|---|---|---|
| 1 | a b f g | 20 15 20 20 |
| 2 | a c f g | 5 30 20 10 |
| 3 | d f g | 30 20 15 |
| 4 | b c d | 60 80 10 |
| 5 | b c d e | 60 40 20 5 |
| 6 | a b c e g | 10 20 45 10 25 |
Note: Duplicate items must not exist in a transaction.
Each row in a transactional database must contain only items. The frequent pattern mining algorithms in PAMI implicitly assume the row number of a transaction as its transactional-identifier to reduce storage and processing costs. A sample transactional database, say sampleInputFile.txt, is provided below.
a b f g:20 15 20 20
a c f g:5 30 20 10
d f g:30 20 15
b c d:60 80 10
b c d e:60 40 20 5
a b c e g:10 20 45 10 25
A neighborhood database contains items and their neighbors. An item x is said to be a neighbor of y if the distance between x and y is no more than the user-specified maximum distance threshold value.
A hypothetical neighborhood database containing the items a, b, c, d, e, f and g is shown below.
| Items | neighbours |
|---|---|
| a | b, c, e |
| b | a, d, e |
| c | a, d, e |
| d | b, c, e, f |
| e | a, b, c, d |
| f | d, g |
| g | f |
The methodology to create a neighborhood database file from a given geo-referenced database has been described in the manual creatingNeighborhoodFile.pdf
The format of the neighborhood database is similar to that of a transactional database. That is, each transaction must contain a set of items. In a transaction, the first item represents the key item, while the remaining items represent the neighbors of the first item.
A sample neighborhood file, say sampleNeighbourFile.txt, is provided below:
a b c e
b a d e
c a d e
d b c e f
e a b c d
f d g
g f
To understand about the database. The below code will give the detail about the transactional database.
The below sample code prints the statistical details of a database.
import PAMI.extras.dbStats.TransactionalDatabase as stats
obj = stats.TransactionalDatabase('sampleInputFile.txt', ' ')
obj.run()
obj.printStats()
Algorithms to mine the spatial weighted frequent patterns require transactional database with weights, neighborhood database, a user-specified minSup constraint and a Algorithms to mine the weighted frequent spatial patterns require transactional database with weights, neighborhood database, a user-specified minWeight constraint. Please note that maxDist constraint has been used in prior to create a neighborhood database file.
. Please note that maxDist constraint has been used in prior to create a neighborhood database file.
- String : E.g., ‘transactionalDatabase.txt’
- URL : E.g., https://u-aizu.ac.jp/~udayrage/datasets/transactionalDatabases/transactional_T10I4D100K.csv
- DataFrame with the header titled ‘Transactions’
- String : E.g., ‘NeighbourhoodDatabase.txt’
- URL : E.g., https://u-aizu.ac.jp/~udayrage/datasets/transactionalDatabases/transactional_T10I4D100K.csv
- DataFrame with the header titled ‘item’ and ‘Neighbours’
- String : E.g., ‘WeightDatabase.txt’
- URL : E.g., https://u-aizu.ac.jp/~udayrage/datasets/transactionalDatabases/transactional_T10I4D100K.csv
- DataFrame with the header titled ‘items’ and ‘weights’
- count (beween 0 to length of a database) or
- [0, 1]
- count (beween 0 to length of a database) or
- [0, 1]
The patterns discovered by a weighted frequent neighbourhood pattern mining algorithm can be saved into a file or a data frame.
syntax: python3 algorithmName.py <path to the input file> <path to the output file> <path to the neighbourhood file> <path to the weight file> <minSup> <minWeight> <seperator>
Example: python3 WFIM.py inputFile.txt outputFile.txt neighbourSample.txt weightSample.txt 3 2 5 ' '
import PAMI.weightedFrequentNeighbourhoodPattern.basic.SWFPGrowth as alg
iFile = 'SWFPWeightSample.txt' #specify the input transactional database
nFile = 'SWFPNeighbourSample.txt' #specify the input neighbourhood database
minWS = 150 #specify the minSupvalue
seperator = ' ' #specify the seperator. Default seperator is tab space.
oFile = 'weightedSpatialPatterns.txt' #specify the output file name
obj = alg.SWFPGrowth(iFile, nFile, minWS, seperator) #initialize the algorithm
obj.mine() #start the mining process
obj.save(oFile) #store the patterns in file
df = obj.getPatternsAsDataFrame() #Get the patterns discovered into a dataframe
obj.printResults() #Print the stats of mining process
Weighted Frequent patterns were generated successfully using SWFPGrowth algorithm
Total number of Weighted Spatial Frequent Patterns: 8
Total Memory in USS: 114229248
Total Memory in RSS 153247744
Total ExecutionTime in ms: 0.0018541812896728516
The weightedPatterns.txt file contains the following patterns (format: pattern:support): !cat weightedPatterns.txt
!cat weightedPatterns.txt
cat: weightedPatterns.txt: No such file or directory
The dataframe containing the patterns is shown below:
df
| Patterns | Support | |
|---|---|---|
| 0 | a | 170 |
| 1 | e | 225 |
| 2 | e b | 215 |
| 3 | b | 245 |
| 4 | b d | 270 |
| 5 | c | 270 |
| 6 | c d | 150 |
| 7 | d | 325 |