Mining Frequent Spatial Patterns in Geo-referenced Transactional Databases

1. What is frequent spatial pattern mining?

A geo-referenced time series database represents the data generated by a set of fixed locations (or spatial items) observing a particular phenomenon over time. This data hides valuable information that can help users progress in their social and economic lives. Frequent spatial pattern mining (FSPM) aims to discover all of those interesting sets of items that were not only frequent, but also close to one another in a coordinate system.

Given a transactional database and a spatial (or neighborhood) file, FSPM aims to discover all of those patterns that satisfy the user-specified minimum support (minSup) and maximum distance (maxDist) constraints. The minSup controls the minimum number of transactions that a pattern must appear in a database. The maxDist controls the maximum distance that can exist between any two items in a pattern.

Reference: R. Uday Kiran, Sourabh Shrivastava, Philippe Fournier-Viger, Koji Zettsu, Masashi Toyoda, and Masaru Kitsuregawa. 2020. Discovering Frequent Spatial Patterns in Very Large Spatiotemporal Databases. In Proceedings of the 28th International Conference on Advances in Geographic Information Systems (SIGSPATIAL ‘20). Association for Computing Machinery, New York, NY, USA, 445–448. link

2. What is a transactional database?

A transactional database is a collection of transactions, where each transaction contains a transaction-identifier and a set of items.
A hypothetical transactional database containing the items a, b, c, d, e, f, and g is shown below

tid	Transactions
1	a b c g
2	b c d e
3	a b c d
4	a c d f
5	a b c d g
6	c d e f
7	a b c d
8	a e f
9	a b c d
10	b c d e

Note: Duplicate items must not exist in a transaction.

3. What is the acceptable format of a transactional database in PAMI?

Each row in a transactional database must contain only items. The frequent pattern mining algorithms in PAMI implicitly assume the row number of a transaction as its transactional-identifier to reduce storage and processing costs.

A sample transactional database, say sampleTransactionalDatabase.txt, is provided below.

a b c g
b c d e
a b c d
a c d f
a b c d g
c d e f
a b c d
a e f
a b c d
b c d e

4. What is a neighborhood database?

A neighborhood database contains items and their neighbors. An item x is said to be a neighbor of y if the distance between x and y is no more than the user-specified maximum distance threshold value.

A hypothetical neighborhood database containing the items a, b, c, d, e, f and g is shown below.

Items	neighbours
a	b, c, d
b	a, e, g
c	a, d
d	a, c
e	b, f
f	e, g
g	b, f

The methodology to create a neighborhood database file from a given geo-referenced database has been described in the manual creatingNeighborhoodFile.html

5. What is the acceptable format of a neighborhood database?

The format of the neighborhood database is similar to that of a transactional database. That is, each transaction must contain a set of items. In a transaction, the first item represents the key item, while the remaining items represent the neighbors of the first item.

A sample neighborhood file, say sampleNeighbourFile.txt, is provided below:

a b c d
b a e g
c a d
d a c
e b f
f e g
g b f

6. What is the need for understanding the statistics of a transactional database?

The performance of a pattern mining algorithm primarily depends on the satistical nature of a database. Thus, it is important to know the following details of a database:

Total number of transactions (Database size)
Total number of unique items in database
Minimum lenth of transaction that existed in database
Average length of all transactions that exists in database
Maximum length of transaction that existed in database
Standard deviation of transaction length
Variance in transaction length
Sparsity of database

The below sample code prints the statistical details of a database.

import PAMI.extras.dbStats.TransactionalDatabase as stats 
obj = stats.TransactionalDatabase('sampleTransactionalDatabase.txt', ' ') 
obj.run() 
obj.printStats() 

7. What are the input parameters to a frequent spatial pattern mining algorithm?

Algorithms to mine the frequent spatial patterns require transactional database, neighborhood database, and a user-specified minSup constraint. Please note that maxDist constraint has been used in prior to create a neighborhood database file.

Transactional database can be provided in following formats:
- String : E.g., ‘transactionalDatabase.txt’
- URL : E.g., https://u-aizu.ac.jp/~udayrage/datasets/transactionalDatabases/spatial_T10I4D100K.csv
- DataFrame. Please note that dataframe must contain the header titled ‘Transactions’
Neighbourhood database can be provided in following formats:
- String : E.g., ‘spatialDatabase.txt’
- URL : E.g., https://u-aizu.ac.jp/~udayrage/datasets/transactionalDatabases/neighbour_T10I4D100K.csv
- DataFrame. Please note that dataframe must contain the header titled ‘Transactions’
minSup
specified in
- count (beween 0 to length of a database) or
- [0, 1]
seperator
default seperator is ‘\t’ (tab space)

8. How to store the output of a frequent spatial pattern mining algorithm?

The patterns discovered by a frequent spatial pattern mining algorithm can be saved into a file or a data frame.

9. How to execute a frequent spatial pattern mining algorithm in a terminal?

Download the PAMI source code from Github.
Unzip the PAMI source code folder.
Enter into frequentSpatialPattern folder.

foo@bar: cd PAMI/frequentSpatialPattern/basic

Execute the python program on their terminal using the following syntax:

foo@bar:  python3 algorithmName.py inputFile outputFile neighbourFile minSup seperator

Example: python3 FSPGrowth.py inputFile.txt outputFile.txt neighbourFile.txt 3 ' '

10. How to execute a frequent spatial pattern mining algorithm in a Jupyter Notebook?

Install the PAMI package from the PYPI repository by executing the following command: pip3 install PAMI
Run the below sample code by making necessary changes

import PAMI.frequentSpatialPattern.basic.FSPGrowth as alg

iFile = 'sampleTransactionalDatabase.txt'  # specify the input transactional database 
nFile = 'sampleNeighbourFile.txt'  # specify the neighbour file name
minSup = 5  # specify the minSup value
seperator = ' '  # specify the seperator. Default seperator is tab space. 
oFile = 'frequentSpatialPatterns.txt'  # specify the output file name

obj = alg.FSPGrowth(iFile, nFile, minSup, seperator)  # initialize the algorithm 
obj.mine()  # start the mining process 
df = obj.getPatternsAsDataFrame()  # Get the patterns discovered into a dataframe 
obj.save(oFile)  # store the patterns in file 
obj.printResults()  # Print the stats of mining process

10 7
Frequent Spatial Patterns successfully generated using FSPGrowth
Total number of Spatial Frequent Patterns: 9
Total Memory in USS: 81281024
Total Memory in RSS 119287808
Total ExecutionTime in ms: 0.0004096031188964844

!cat frequentSpatialPatterns.txt
#The format of the file is pattern:support

a:7 
b:7 
c:9 
d:8 
d	c:8 
b	a:5 
a	c:6 
a	d:5 
a	d	c:5 

df #The dataframe containing the patterns is shown below.

	Patterns	Support
0	a	7
1	b	7
2	c	9
3	d	8
4	d c	8
5	b a	5
6	a c	6
7	a d	5
8	a d c	5