Introduction
One thing I usually tell my team is that knowledge of databases, programming
and math significantly enhances one’s GIS skills. There is a lot of use-cases
where we apply these disciplines in our everyday use of GIS.
In this post, I will give a use-case that requires database knowledge – SQL (or
relational) database to be specific. This is to do with Joins (both Attribute and Spatial).
If you are knowledgeable in SQL databases, then there is no doubt that you will feel at home with the JOIN concept. In simple terms, JOIN is used to retrieve data from
two or more related tables in a relational database. The concept of primary key,
foreign key, inner join, left join, right join, full join, cross join, etc becomes
immediately obvious here.
In the same way as we use JOIN in a relational database, we also use the same
in GIS to retrieve data from two or more related tables. In GIS, the relationship
can be defined using either Attribute or Spatial Join, depending on the
requirements.
Justification for knowing Join
Often times, we find ourselves in situations where a table in GIS has a column
that contains repeated (or redundant data) data, and in effect this can lead to
inconsistent data. In SQL databases, we take care of data integrity by using constraints – including foreign key constraint, which has a direct bearing
on JOIN.
Enforcement of data integrity using constraints is an all-too-important part of
the toolkit for any data engineer of software engineer. There is no reason why
a GIS professional should care less about data quality – or by extension about JOIN.
Just to appreciate this argument from a GIS perspective, JOIN is an important spatial processing task, just like the other
common ones like buffering, clipping, dissolving, difference, intersection,
union, etc.
Example Use-case
Take a look at the following GIS table representing parishes (an administrative unit) in Kampala City.
Please disregard any design issues with the table. That is a subject for
another day. We shall use this data to articulate both Attribute and Spatial Joins immediately after.
Attribute Join
In the above example, a Parish (PARID) belongs to a Division (DIVID). Someone could have decided to use descriptive names for parish name and division name directly as substitute columns for DIVID and PARID respectively in this layer. However, this would make the data vulnerable to data integrity issues, especially when data on Division is updated.
We can resolve the data integrity challenge using Attribute Join.
Note: The Join layer is the subsidiary layer that we want to join to the Target (main) layer.
Attribute Join involves matching the rows from the Join Layer to the Target (main) Layer based on the values of the join attribute.
In this case, we want to join our main table with two separate standalone tables for divisions and parishes, like below:
|
Divisions |
|
|
ID |
DivName |
|
1 |
Kawempe |
|
2 |
Nakawa |
|
… |
|
|
63 |
Central |
|
64 |
Rubaga |
|
… |
|
and
|
Parishes |
|
|
ID |
ParName |
|
1 |
Banda |
|
2 |
Bugolobi |
|
… |
|
|
233 |
Makerere |
|
234 |
Mulago |
|
… |
|
Depending on the GIS software that you use, you can easily find out how to do Attribute Join. It is always easy! It will match all rows with the same Parish
ID or Division ID and create new column in the main (target) layer.
Spatial Join
A Spatial Join appends attributes from one layer to another based on a spatial
relationship. Features from both the target (main) layer and those from the join layer are written to the output feature class. Understanding the Spatial Join concept is similar to Attribute Join.
Note: The Join layer is the subsidiary layer that we want to join to the Target (main) layer.
A spatial join involves
matching the rows (features) from the Join Layer to the Target Layer based on their relative spatial
locations.
The process of Spatial Join depends on the GIS software in use, but follows the
same concept. Using the example of parishes above, supposing we have another geometry
layer consisting of centroids of parishes.
If the centroids layer has an attribute for population, then we can join the
two layers with the parishes layer being the target layer, while the centroids
layer is the join layer. This way, we shall be able to append the join
attributes to the target layer based on spatial association and create a new
output feature class.
Spatial Joins can be classified in two ways:
-
Join Operation (Relationship type): one-to-one (join a single feature
from the join features into the target features) or one-to-many
relationship (join all matching features from the join features into the
target features
- creating multiple overlapping records).
- Match type: Before you run a spatial join, you have
to select a spatial match type, i.e. the proximity that the spatial join will
search for e.g. intersect, within distance, completely within, identical,
closest. Depending on the spatial join type, it can affix one or several
features into the target layer. In this case, you have to pick a “Join”
operation that includes one-to-one or one-to-many.
Conclusion
Both Attribute and Spatial Joins require database-type knowledge for easy understanding. For someone
who is already knowledgeable in SQL database, it is pretty easy to grasp these
Join processing tasks.
My experience based on interaction with GIS professionals who do not have database-type
knowledge is that understanding Joins can be challenging to them, just like automation and
scripting of GIS is.
No comments:
Post a Comment