Table of Contents


One of the reasons I started learning Python a year and a half ago is that I wanted to contribute to open source GIS projects. My motivation for this plugin came from the fact that during GIS seminars at my university, we would often download data from the Eurostat platform to use for some sort of analysis. Manually downloading, cleaning, and joining the data with a vector layer was a tedious process that would cost us a lot of time. The Eurostat downloader plugin aims to be a fast and reliable way of joining data from the Eurostat database with a vector layer inside QGIS. If you know what you are searching for, I don't think it takes more than 30 seconds to search for the dataset, filter it out, and perform a join. Let's have a look at how it works.


Downloading the plugin

Right now, the plugin is available as an experimental plugin inside QGIS. Make sure to check Show also experimental plugins so that you can find it in the plugin repository.


Handling Python dependencies

This plugin relies on the eurostat python package.

Windows and Linux

Users of these two platforms are not required to do anything. The external dependencies are already provided inside the plugin folder. If issues appear regarding plugin dependencies, please contact me and let me know.

MacOS

For macOS users, check tutorials online on how to handle the installation of external python packages. The eurostat python package needs to be installed.


How to use the plugin

First, we will need a vector layer that we can join data to. We can download one from the Eurostat platform. I will download this one and add it to QGIS.

Loading the table of contents and the dataset

Now open the Eurostat downloader plugin. In order to load the table of contents, you have to type something in the search bar (this may take a while depending on your internet speed). Now search for any dataset that you would like to use. I will choose CENS_HNCTZ: Population by sex, age and citizenship. You can search both by the code (CENS_HNCTZ in this case) or by the title of the dataset. After you click on the dataset, just like with the table of contents, you have to wait for the table to be fetched from the Eurostat API. Once the table is loaded, we are ready to proceed further.

Applying filters to table columns

The displayed data is long form (more on long form data here: https://towardsdatascience.com/long-and-wide-formats-in-data-explained-e48d7c9a06cb). We can apply filters to each of the columns. For example, I would like the table to only display the total population. Right now it is displaying the female, male and total population. To do so, we can right click on the name of the column and select (which stands for total). We can also click on F and M to see how the table is changing dynamically.

Instead of having abbreviations like F, M and T, we could make use of a translation of some kind. In order to have one, select a flag from the bottom left, then click on the citizen column.

This is more explicit and makes working with the data easier. If you selected a language, it might take a bit longer for the Edit section window to pop up because the Eurostat API is being called. We will once again select TOTAL here. We will select TOTAL for the age column too.

We can see that for the year 1991 we have a bunch of missing values and we do not want that column to appear in our vector layer. In order to perform a filter in this case, we will click on any of the time period columns. We will select 2001 as a start time to remove the 1991 column.

Joining the data

We are now ready to join the data (learn more about joins here if you do not know what that is). The only thing left is to define the join fields. By default, the plugin tries to infer both of the join fields but it may fail to do so. Be careful and select the correct join fields. After the join fields have been selected, click on the join data button. We will see that a temporary table has been added to our QGIS instance.

If we look at the properties of our vector layer, we will see that the join has been defined.

We are now free to use the data for any analysis that we wish to do.

I will let you explore the plugin further. You can come up with many workflows that can speed up your analysis. For example, you could join multiple datasets to the vector layer, compute some sort of index, and then delete the temporary tables. This will remove the unnecessary fields while keeping your computed index.

It's important to remember that the created table is TEMPORARY. In order to make sure that you do not lose any data, you could export the vector layer after you perform the join. There are also other ways of getting around this issue.

Useful links

Plugin details

Code repository

Browse and report bugs

0 comments

There are no comments for this post.

Make a new comment