One of the strengths of R is its vast ecosystem of libraries. This includes numerous sophisticated visualization libraries—some of which, such as the excellent ggplot, are capable of producing publication quality charts.
I’m not a huge fan of charts (e.g. scatterplots, barcharts) as communication devices. I believe that conclusions can usually be presented more succinctly.
However, R has many visualization libraries that are not chart-based—and these can really add punch to a presentation. One such library is rworldmap—which allows data to be presented as a heat map of countries.
In this article, we’ll look at how to use rworldmap to visualize World Bank data.
The Word Bank maintains a comprehensive repository of country-specific data. Major categories of data contained in the repository are:
As I said—comprehensive.
Let’s use the data to calculate the rate of population growth, in each country, over the past five years.
First, download the data, in CSV format, from the World Bank’s data repository.
This will download a zip archive. Unzip it. The population data is contained in sp.pop.totl_Indicator_en_csv_v2.csv
. Infuriatingly, this file has four lines of preamble that need to be stripped from the data.
Load this file into R (using the path where the file is contained on your system).
population_records <- readLines("C:\\sp.pop.totl_Indicator_en_csv_v2.csv")[-(1:4)]
population_data <- read.csv(text=population_records, header = TRUE)
Check the data.
str(population_data)
This should show that the data contains, among other columns, Country.Name
, Country.Code
and population counts for 1960-2014 (X1960
–X2014
).
To calculate the rate of population growth for each county between 2009 and 2014 we can use –
population_data$Growth.5.Year <-
((population_data$X2014 - population_data$X2009) / population_data$X2014) * 100
Let’s now install and load the rworldmap visualization library.
install.packages("rworldmap")
library(rworldmap)
The first thing we have to do is join our data to the map. This involves specifying the country code column in your data that will be used to match the country identifier used in the rworldmap library. The World Bank data uses ISO 3166 three-letter codes.
To join our data to the map, we use
mapped_data <- joinCountryData2Map(population_data, joinCode = "ISO3",
nameJoinColumn = "Country.Code")
joinCode = "ISO3"
tells rworldmap to join the data using ISO 3166 codes.
joinCountryData2Map
will report that it was unable to map some codes, but this doesn’t matter for our purposes.
We can now display the mapped data.
par(mai=c(0,0,0.2,0),xaxs="i",yaxs="i")
mapCountryData(mapped_data, nameColumnToPlot = "Growth.5.Year")
The par
command only needs to be performed once in the session and just makes sure that all the available space in the window is used to display the map.
This results in the following image:
The legend shows that growth ranges between -8% and 32%.
Europe, China and Russia had low population growth in the last five years. In contrast, much of Africa and some of the Middle East experienced high growth.
This article has demonstrated how easy it can be to produce compelling visualizations using R.
If you are interested in learning how to use R productively, Learning Tree runs a couple of courses that will be of interest to you.
image sources