PAMI is a Python library containing 100+ algorithms to discover useful patterns in various databases across multiple computing platforms. (Active)
A geo-referenced time series database represents the data generated by a set of fixed locations (or spatial items) observing a particular phenomenon over time. This data hides valuable information that can help users progress in their social and economic lives. Frequent spatial pattern mining (FSPM) aims to discover all of those interesting sets of items that were not only frequent, but also close to one another in a coordinate system.
Given a transactional database and a spatial (or neighborhood) file, FSPM aims to discover all of those patterns that satisfy the user-specified minimum support (minSup) and maximum distance (maxDist) constraints. The minSup controls the minimum number of transactions that a pattern must appear in a database. The maxDist controls the maximum distance that can exist between any two items in a pattern.
Reference: R. Uday Kiran, Sourabh Shrivastava, Philippe Fournier-Viger, Koji Zettsu, Masashi Toyoda, and Masaru Kitsuregawa. 2020. Discovering Frequent Spatial Patterns in Very Large Spatiotemporal Databases. In Proceedings of the 28th International Conference on Advances in Geographic Information Systems (SIGSPATIAL ‘20). Association for Computing Machinery, New York, NY, USA, 445–448. link
A transactional database is a collection of transactions, where each transaction contains a transaction-identifier and a set of items.
A hypothetical transactional database containing the items a, b, c, d, e, f, and g is shown below
tid | Transactions |
---|---|
1 | a b c g |
2 | b c d e |
3 | a b c d |
4 | a c d f |
5 | a b c d g |
6 | c d e f |
7 | a b c d |
8 | a e f |
9 | a b c d |
10 | b c d e |
Note: Duplicate items must not exist in a transaction.
Each row in a transactional database must contain only items. The frequent pattern mining algorithms in PAMI implicitly assume the row number of a transaction as its transactional-identifier to reduce storage and processing costs.
A sample transactional database, say sampleTransactionalDatabase.txt, is provided below.
a b c g
b c d e
a b c d
a c d f
a b c d g
c d e f
a b c d
a e f
a b c d
b c d e
A neighborhood database contains items and their neighbors. An item x is said to be a neighbor of y if the distance between x and y is no more than the user-specified maximum distance threshold value.
A hypothetical neighborhood database containing the items a, b, c, d, e, f and g is shown below.
Items | neighbours |
---|---|
a | b, c, d |
b | a, e, g |
c | a, d |
d | a, c |
e | b, f |
f | e, g |
g | b, f |
The methodology to create a neighborhood database file from a given geo-referenced database has been described in the manual creatingNeighborhoodFile.html
The format of the neighborhood database is similar to that of a transactional database. That is, each transaction must contain a set of items. In a transaction, the first item represents the key item, while the remaining items represent the neighbors of the first item.
A sample neighborhood file, say sampleNeighbourFile.txt, is provided below:
a b c d
b a e g
c a d
d a c
e b f
f e g
g b f
The performance of a pattern mining algorithm primarily depends on the satistical nature of a database. Thus, it is important to know the following details of a database:
The below sample code prints the statistical details of a database.
import PAMI.extras.dbStats.TransactionalDatabase as stats
obj = stats.TransactionalDatabase('sampleTransactionalDatabase.txt', ' ')
obj.run()
obj.printStats()
Algorithms to mine the frequent spatial patterns require transactional database, neighborhood database, and a user-specified minSup constraint. Please note that maxDist constraint has been used in prior to create a neighborhood database file.
- String : E.g., ‘transactionalDatabase.txt’
- URL : E.g., https://u-aizu.ac.jp/~udayrage/datasets/transactionalDatabases/spatial_T10I4D100K.csv
- DataFrame. Please note that dataframe must contain the header titled ‘Transactions’
- String : E.g., ‘spatialDatabase.txt’
- URL : E.g., https://u-aizu.ac.jp/~udayrage/datasets/transactionalDatabases/neighbour_T10I4D100K.csv
- DataFrame. Please note that dataframe must contain the header titled ‘Transactions’
- count (beween 0 to length of a database) or
- [0, 1]
The patterns discovered by a frequent spatial pattern mining algorithm can be saved into a file or a data frame.
foo@bar: cd PAMI/frequentSpatialPattern/basic
foo@bar: python3 algorithmName.py inputFile outputFile neighbourFile minSup seperator
Example: python3 FSPGrowth.py
inputFile.txt
outputFile.txt
neighbourFile.txt
3 ' '
import PAMI.frequentSpatialPattern.basic.FSPGrowth as alg
iFile = 'sampleTransactionalDatabase.txt' # specify the input transactional database
nFile = 'sampleNeighbourFile.txt' # specify the neighbour file name
minSup = 5 # specify the minSup value
seperator = ' ' # specify the seperator. Default seperator is tab space.
oFile = 'frequentSpatialPatterns.txt' # specify the output file name
obj = alg.FSPGrowth(iFile, nFile, minSup, seperator) # initialize the algorithm
obj.mine() # start the mining process
df = obj.getPatternsAsDataFrame() # Get the patterns discovered into a dataframe
obj.save(oFile) # store the patterns in file
obj.printResults() # Print the stats of mining process
10 7
Frequent Spatial Patterns successfully generated using FSPGrowth
Total number of Spatial Frequent Patterns: 9
Total Memory in USS: 81281024
Total Memory in RSS 119287808
Total ExecutionTime in ms: 0.0004096031188964844
!cat frequentSpatialPatterns.txt
#The format of the file is pattern:support
a:7
b:7
c:9
d:8
d c:8
b a:5
a c:6
a d:5
a d c:5
df #The dataframe containing the patterns is shown below.
Patterns | Support | |
---|---|---|
0 | a | 7 |
1 | b | 7 |
2 | c | 9 |
3 | d | 8 |
4 | d c | 8 |
5 | b a | 5 |
6 | a c | 6 |
7 | a d | 5 |
8 | a d c | 5 |