Mining Fuzzy Frequent Spatial Patterns in Geo-referenced Fuzzy Transactional Databases

PAMI is a Python library containing 100+ algorithms to discover useful patterns in various databases across multiple computing platforms. (Active)

Mining Fuzzy Frequent Spatial Patterns in Geo-referenced Fuzzy Transactional Databases

1. What is Fuzzy Frequent Spatial pattern mining?

Geo-referenced transactional database represents the data generated by a set of stationary sensors (or objects) observing a particular phenomenon over a time period. Useful information that can facilitate the users to achieve socio-economic development lies hidden in the data. Past works focused on finding frequently occurring spatial patterns (i.e., patterns in which objects are neighbors to one another) in a binary geo-referenced transactional database. A key limitation of these studies is that they fail to discover interesting spatial regularities that may exist in a quantitative (or non-binary) geo-referemced transactional database. Fuzzy frequent spatial pattern mining was introduced to tackle this problem.

Fuzzy frequent spatial pattern mining converts the given quantitative geo-referenced transactional database into a fuzzy geo-referenced transactional database using a set of user-defined fuzzy functions, and mines this database to discover all patterns that satisfy the user-specified minimum support (minSup) and maximum distance (maxDist) constraints. The minSup controls the minimum number of transactions that a pattern must appear in a database. The maxDist controls the maximum distance between any two objects in a pattern.

Reference: P. Veena, B. S. Chithra, R. U. Kiran, S. Agarwal and K. Zettsu, “Discovering Fuzzy Frequent Spatial Patterns in Large Quantitative Spatiotemporal databases,” 2021 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), 2021, pp. 1-8, doi: 10.1109/FUZZ45933.2021.9494594. Link

2. What is a fuzzy transactional database?

A fuzzy transactional database is a collection of transactions, where each transaction contains a set of fuzzy items and their respective fuzzy (or probability) values. Please note that the fuzzy values of a fuzzy item will always lie between (0,1) or (0%, 100%).

Given a quantitative transactional database containing the items, a, b, c, d, e, f and g, and a set of fuzzy membership labels, Low (L), Medium (M), and High (H), a generated hypothetical fuzzy database is shown below.A fuzzy database is a collection of transactions, where each transaction contains a transaction-identifier, set of items, and its fuzzy values respectively.

A hypothetical utility database with items a, b, c, d, e, f and g and its fuzzy values are shown below:

Transactions
(a.L,0.2) (b.M,0.3) (c.H,0.1) (g.M,0.1)
(b.M,0.3) (c.H,0.2) (d.L,0.3) (e.H,0.2)
(a.L,0.2) (b.M,0.1) (c.H,0.3) (d.L,0.4)
(a.L,0.3) (c.H,0.2) (d.L,0.1) (f.M,0.2)
(a.L,0.3) (b.M,0.1) (c.H,0.2) (d.L,0.1) (g.M,0.2)
(c.H,0.2) (d.L,0.2) (e.H,0.3) (f.M,0.1)
(a.L,0.2) (b.M,0.1) (c.H,0.1) (d.L,0.2)
(a.L,0.1) (e.H,0.2) (f.M,0.2)
(a.L,0.2) (b.M,0.2) (c.H,0.4) (d.L,0.2)
(b.M,0.3) (c.H,0.2) (d.L,0.2) (e.H,0.2)

Note: Duplicate items must not exist in a transaction.

3. What is the acceptable format of a fuzzy database in PAMI?

Each row in a fuzzy transactional database must contain list of fuzzy items, colon as a seperator, and their list of fuzzy values.

A sample fuzzy transactional database file, say fuzzyTransactionalDatabase.txt, is provided below:

a.L b.M c.H g.M:0.2 0.3 0.1 0.1
b.M c.H d.L e.H:0.13 0.2 0.3 0.2
a.L b.M c.H d.L:0.2 0.1 0.3 0.4
a.L c.H d.L f.M:0.3 0.2 0.1 0.2
a.L b.M c.H d.L g.M:0.3 0.1 0.2 0.1 0.2
c.H d.L e.H f.M:0.2 0.2 0.3 0.1
a.L b.M c.H d.L:0.2 0.1 0.1 0.2
a.L e.H f.M:0.1 0.2 0.2
a.L b.M c.H d.H:0.2 0.2 0.4 0.2
b.M c.H d.L e.H:0.3 0.2 0.2 0.2

For more information on how to create a fuzzy transactional database from a quantitative (or utility) transactional database, please refer to the manual utility2FuzzyDB.pdf

4. What is a neighborhood database?

A neighborhood database contains items and their neighbors. An item x is said to be a neighbor of y if the distance between x and y is no more than the user-specified maximum distance threshold value.

A hypothetical neighborhood database containing the items a, b, c, d, e, f and g is shown below.

Items neighbours
a b, c, d
b a, e, g
c a, d
d a, c
e b, f
f e, g
g b, f

5. What is the acceptable format of a spatial database in PAMI?

The format of the neighborhood database is similar to that of a transactional database. That is, each transaction must contain a set of items. In a transaction, the first item represents the key item, while the remaining items represent the neighbors of the first item.

A sample neighborhood file, say sampleNeighbourFile.txt, is provided below:

a b c d
b a e g
c a d
d a c
e b f
f e g
g b f

For more information on how to create a neighborhood file for a given dataset, please refer to the manual of creating neighborhood file.

6. What is the need for understanding the statisctics of database of a fuzzy transactional database?

To understand about the database. The below code will give the detail about the transactional database.

The below sample code prints the statistical details of a database.

import PAMI.extras.dbStats.FuzzyDatabase as stats

obj = stats.FuzzyDatabase('fuzzyTransactionalDatabase.txt', ' ')
obj.run()
obj.printStats() 

7. What are the input parameters to be specified for a fuzzy frequent spatial pattern mining algorithm?

The input parameters to a fuzzy frequent pattern mining algorithm are:

8. How to store the output of a fuzzy frequent spatial pattern mining algorithm?

The patterns discovered by a fuzzy frequent spatial pattern mining algorithm can be saved into a file or a data frame.

9. How to run a fuzzy frequent spatial pattern mining algorithm in a terminal?

foo@bar: cd PAMI/fuzzySpatialFrequentPattern/basic
foo@bar: python3 algorithmName.py inputFile outputFile neighbourFile minSup seperator

Example: python3 FFSPMiner.py inputFile.txt outputFile.txt neighbourFile.txt 5 ' '

10. How to execute a fuzzy frequent spatial pattern mining algorithm in a Jupyter Notebook?

import PAMI.fuzzyGeoreferencedFrequentPattern.basic.FFSPMiner as alg

iFile = 'fuzzyTransactionalDatabase.txt'  # specify the input fuzzy database 
minSup = 1  # specify the minSup value 
seperator = ' '  # specify the seperator of input file 
oFile = 'fuzzySpatialPatterns.txt'  # specify the output file name
nFile = 'sampleNeighbourFile.txt'  # specify the neighbour file of database

obj = alg.FFSPMiner(iFile, nFile, minSup, seperator)  # initialize the algorithm 
obj.mine()  # start the mining process 
obj.save(oFile)  # store the patterns in file 
df = obj.getPatternsAsDataFrame()  # Get the patterns discovered into a dataframe
obj.printResults()  # Print the stats of mining process
Total number of Spatial Fuzzy Frequent Patterns: 6
Total Memory in USS: 81223680
Total Memory in RSS 119037952
Total ExecutionTime in seconds: 0.0006165504455566406
!cat fuzzySpatialPatterns.txt
#format: fuzzyGeoreferencedFrequentPattern:support
b.M : 1.23 
d.L : 1.4999999999999998 
d.L	c.H : 1.2 
a.L : 1.5 
a.L	c.H : 1.0 
c.H : 1.9000000000000001 
df #database contains the following information
Patterns Support
0 b.M 1.23
1 d.L 1.4999999999999998
2 d.L c.H 1.2
3 a.L 1.5
4 a.L c.H 1.0
5 c.H 1.9000000000000001