PAMI is a Python library containing 100+ algorithms to discover useful patterns in various databases across multiple computing platforms. (Active)
Geo-referenced transactional database represents the data generated by a set of stationary sensors (or objects) observing a particular phenomenon over a time period. Useful information that can facilitate the users to achieve socio-economic development lies hidden in the data. Past works focused on finding frequently occurring spatial patterns (i.e., patterns in which objects are neighbors to one another) in a binary geo-referenced transactional database. A key limitation of these studies is that they fail to discover interesting spatial regularities that may exist in a quantitative (or non-binary) geo-referemced transactional database. Fuzzy frequent spatial pattern mining was introduced to tackle this problem.
Fuzzy frequent spatial pattern mining converts the given quantitative geo-referenced transactional database into a fuzzy geo-referenced transactional database using a set of user-defined fuzzy functions, and mines this database to discover all patterns that satisfy the user-specified minimum support (minSup) and maximum distance (maxDist) constraints. The minSup controls the minimum number of transactions that a pattern must appear in a database. The maxDist controls the maximum distance between any two objects in a pattern.
Reference: P. Veena, B. S. Chithra, R. U. Kiran, S. Agarwal and K. Zettsu, “Discovering Fuzzy Frequent Spatial Patterns in Large Quantitative Spatiotemporal databases,” 2021 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), 2021, pp. 1-8, doi: 10.1109/FUZZ45933.2021.9494594. Link
A fuzzy transactional database is a collection of transactions, where each transaction contains a set of fuzzy items and their respective fuzzy (or probability) values. Please note that the fuzzy values of a fuzzy item will always lie between (0,1) or (0%, 100%).
Given a quantitative transactional database containing the items, a, b, c, d, e, f and g, and a set of fuzzy membership labels, Low (L), Medium (M), and High (H), a generated hypothetical fuzzy database is shown below.A fuzzy database is a collection of transactions, where each transaction contains a transaction-identifier, set of items, and its fuzzy values respectively.
A hypothetical utility database with items a, b, c, d, e, f and g and its fuzzy values are shown below:
| Transactions |
|---|
| (a.L,0.2) (b.M,0.3) (c.H,0.1) (g.M,0.1) |
| (b.M,0.3) (c.H,0.2) (d.L,0.3) (e.H,0.2) |
| (a.L,0.2) (b.M,0.1) (c.H,0.3) (d.L,0.4) |
| (a.L,0.3) (c.H,0.2) (d.L,0.1) (f.M,0.2) |
| (a.L,0.3) (b.M,0.1) (c.H,0.2) (d.L,0.1) (g.M,0.2) |
| (c.H,0.2) (d.L,0.2) (e.H,0.3) (f.M,0.1) |
| (a.L,0.2) (b.M,0.1) (c.H,0.1) (d.L,0.2) |
| (a.L,0.1) (e.H,0.2) (f.M,0.2) |
| (a.L,0.2) (b.M,0.2) (c.H,0.4) (d.L,0.2) |
| (b.M,0.3) (c.H,0.2) (d.L,0.2) (e.H,0.2) |
Note: Duplicate items must not exist in a transaction.
Each row in a fuzzy transactional database must contain list of fuzzy items, colon as a seperator, and their list of fuzzy values.
A sample fuzzy transactional database file, say fuzzyTransactionalDatabase.txt, is provided below:
a.L b.M c.H g.M:0.2 0.3 0.1 0.1
b.M c.H d.L e.H:0.13 0.2 0.3 0.2
a.L b.M c.H d.L:0.2 0.1 0.3 0.4
a.L c.H d.L f.M:0.3 0.2 0.1 0.2
a.L b.M c.H d.L g.M:0.3 0.1 0.2 0.1 0.2
c.H d.L e.H f.M:0.2 0.2 0.3 0.1
a.L b.M c.H d.L:0.2 0.1 0.1 0.2
a.L e.H f.M:0.1 0.2 0.2
a.L b.M c.H d.H:0.2 0.2 0.4 0.2
b.M c.H d.L e.H:0.3 0.2 0.2 0.2
For more information on how to create a fuzzy transactional database from a quantitative (or utility) transactional database, please refer to the manual utility2FuzzyDB.pdf
A neighborhood database contains items and their neighbors. An item x is said to be a neighbor of y if the distance between x and y is no more than the user-specified maximum distance threshold value.
A hypothetical neighborhood database containing the items a, b, c, d, e, f and g is shown below.
| Items | neighbours |
|---|---|
| a | b, c, d |
| b | a, e, g |
| c | a, d |
| d | a, c |
| e | b, f |
| f | e, g |
| g | b, f |
The format of the neighborhood database is similar to that of a transactional database. That is, each transaction must contain a set of items. In a transaction, the first item represents the key item, while the remaining items represent the neighbors of the first item.
A sample neighborhood file, say sampleNeighbourFile.txt, is provided below:
a b c d
b a e g
c a d
d a c
e b f
f e g
g b f
For more information on how to create a neighborhood file for a given dataset, please refer to the manual of creating neighborhood file.
To understand about the database. The below code will give the detail about the transactional database.
The below sample code prints the statistical details of a database.
import PAMI.extras.dbStats.FuzzyDatabase as stats
obj = stats.FuzzyDatabase('fuzzyTransactionalDatabase.txt', ' ')
obj.run()
obj.printStats()
The input parameters to a fuzzy frequent pattern mining algorithm are:
- String : E.g., ‘fuzzyDatabase.txt’
- URL : E.g., https://u-aizu.ac.jp/~udayrage/datasets/fuzzyDatabases/fuzzy_T10I4D100K.csv
- DataFrame with the header titled ‘Transactions’ and ‘fuzzyValues’
- String : E.g., ‘spatialDatabase.txt’
- URL : E.g., https://uaizu.ac.jp/~udayrage/datasets/fuzzyDatabases/neighbour_T10I4D100K.csv
- DataFrame with the header titled ‘item’ and ‘Neighbours’
- count (beween 0 to length of database)
- [0,1]
The patterns discovered by a fuzzy frequent spatial pattern mining algorithm can be saved into a file or a data frame.
foo@bar: cd PAMI/fuzzySpatialFrequentPattern/basic
foo@bar: python3 algorithmName.py inputFile outputFile neighbourFile minSup seperator
Example: python3 FFSPMiner.py inputFile.txt outputFile.txt neighbourFile.txt 5 ' '
import PAMI.fuzzyGeoreferencedFrequentPattern.basic.FFSPMiner as alg
iFile = 'fuzzyTransactionalDatabase.txt' # specify the input fuzzy database
minSup = 1 # specify the minSup value
seperator = ' ' # specify the seperator of input file
oFile = 'fuzzySpatialPatterns.txt' # specify the output file name
nFile = 'sampleNeighbourFile.txt' # specify the neighbour file of database
obj = alg.FFSPMiner(iFile, nFile, minSup, seperator) # initialize the algorithm
obj.mine() # start the mining process
obj.save(oFile) # store the patterns in file
df = obj.getPatternsAsDataFrame() # Get the patterns discovered into a dataframe
obj.printResults() # Print the stats of mining process
Total number of Spatial Fuzzy Frequent Patterns: 6
Total Memory in USS: 81223680
Total Memory in RSS 119037952
Total ExecutionTime in seconds: 0.0006165504455566406
!cat fuzzySpatialPatterns.txt
#format: fuzzyGeoreferencedFrequentPattern:support
b.M : 1.23
d.L : 1.4999999999999998
d.L c.H : 1.2
a.L : 1.5
a.L c.H : 1.0
c.H : 1.9000000000000001
df #database contains the following information
| Patterns | Support | |
|---|---|---|
| 0 | b.M | 1.23 |
| 1 | d.L | 1.4999999999999998 |
| 2 | d.L c.H | 1.2 |
| 3 | a.L | 1.5 |
| 4 | a.L c.H | 1.0 |
| 5 | c.H | 1.9000000000000001 |