Geographic Information System (GIS): Attribute vs Spatial Joins

Introduction

One thing I usually tell my team is that knowledge of databases, programming and math significantly enhances one’s GIS skills. There is a lot of use-cases where we apply these disciplines in our everyday use of GIS.

In this post, I will give a use-case that requires database knowledge – SQL (or relational) database to be specific. This is to do with Joins (both Attribute and Spatial).

If you are knowledgeable in SQL databases, then there is no doubt that you will feel at home with the JOIN concept. In simple terms, JOIN is used to retrieve data from two or more related tables in a relational database. The concept of primary key, foreign key, inner join, left join, right join, full join, cross join, etc becomes immediately obvious here.

In the same way as we use JOIN in a relational database, we also use the same in GIS to retrieve data from two or more related tables. In GIS, the relationship can be defined using either Attribute or Spatial Join, depending on the requirements.

Justification for knowing Join

Often times, we find ourselves in situations where a table in GIS has a column that contains repeated (or redundant data) data, and in effect this can lead to inconsistent data. In SQL databases, we take care of data integrity by using constraints – including foreign key constraint, which has a direct bearing on JOIN.

Enforcement of data integrity using constraints is an all-too-important part of the toolkit for any data engineer of software engineer. There is no reason why a GIS professional should care less about data quality – or by extension about JOIN.

Just to appreciate this argument from a GIS perspective, JOIN is an important spatial processing task, just like the other common ones like buffering, clipping, dissolving, difference, intersection, union, etc.

Example Use-case

Take a look at the following GIS table representing parishes (an administrative unit) in Kampala City. Please disregard any design issues with the table. That is a subject for another day. We shall use this data to articulate both Attribute and Spatial Joins immediately after.

Attribute Join

In the above example, a Parish (PARID) belongs to a Division (DIVID). Someone could have decided to use descriptive names for parish name and division name directly as substitute columns for DIVID and PARID respectively in this layer. However, this would make the data vulnerable to data integrity issues, especially when data on Division is updated.

We can resolve the data integrity challenge using Attribute Join.

Note: The Join layer is the subsidiary layer that we want to join to the Target (main) layer.

Attribute Join involves matching the rows from the Join Layer to the Target (main) Layer based on the values of the join attribute.

In this case, we want to join our main table with two separate standalone tables for divisions and parishes, like below:

Divisions
ID	DivName
1	Kawempe
2	Nakawa
…
63	Central
64	Rubaga
…

and

Parishes
ID	ParName
1	Banda
2	Bugolobi
…
233	Makerere
234	Mulago
…

Depending on the GIS software that you use, you can easily find out how to do Attribute Join. It is always easy! It will match all rows with the same Parish ID or Division ID and create new column in the main (target) layer.

Spatial Join

A Spatial Join appends attributes from one layer to another based on a spatial relationship. Features from both the target (main) layer and those from the join layer are written to the output feature class. Understanding the Spatial Join concept is similar to Attribute Join.

Note: The Join layer is the subsidiary layer that we want to join to the Target (main) layer.

A spatial join involves matching the rows (features) from the Join Layer to the Target Layer based on their relative spatial locations.

The process of Spatial Join depends on the GIS software in use, but follows the same concept. Using the example of parishes above, supposing we have another geometry layer consisting of centroids of parishes.

If the centroids layer has an attribute for population, then we can join the two layers with the parishes layer being the target layer, while the centroids layer is the join layer. This way, we shall be able to append the join attributes to the target layer based on spatial association and create a new output feature class.

Spatial Joins can be classified in two ways:

Join Operation (Relationship type): one-to-one (join a single feature from the join features into the target features) or one-to-many relationship (join all matching features from the join features into the target features - creating multiple overlapping records).
Match type: Before you run a spatial join, you have to select a spatial match type, i.e. the proximity that the spatial join will search for e.g. intersect, within distance, completely within, identical, closest. Depending on the spatial join type, it can affix one or several features into the target layer. In this case, you have to pick a “Join” operation that includes one-to-one or one-to-many.

Conclusion

Both Attribute and Spatial Joins require database-type knowledge for easy understanding. For someone who is already knowledgeable in SQL database, it is pretty easy to grasp these Join processing tasks.

My experience based on interaction with GIS professionals who do not have database-type knowledge is that understanding Joins can be challenging to them, just like automation and scripting of GIS is.

Orama's Data Blog

sub-title

Sunday, 3 July 2022

Geographic Information System (GIS): Attribute vs Spatial Joins

No comments:

Post a Comment