Facet wrapping multivariate data: reshape and ggplot

A common problem when trying to show data is that the attributes that you want to map for comparison are stored in multiple rather than single variables. For example, proportion of employment by type. This practical will achieve tis using reshape2, ggplot and maptools.


Download the the dataset from here and unzip to a suitable directory.

londonData <- read.csv("ward_atlas_2011_employment.csv")
londonShape <- readShapePoly("london_sport.shp")
londonShape@data$id = rownames(londonShape@data)

As you are probably now aware, the data for London is at ward level. For this visualisation, we want to reduce this to boroughs. The first step is to work out borough codes which are the first 4 characters of the ward code (Codes). i.e for the ward 00ABGB the borough code is ooAB. This is Achieved with the substr() function in base R.

londonData$ward <- substr(londonData$Codes,1,4)

With all records now holding a borough code we can use the melt() function in the reshape2 package.  The melt function then creates a unique row for each id field and variable. We specify id variables in the underlined section.

LondonDataMelt <- melt(londonData, id.vars=c("Codes", "Names", "ward"))
Codes Names ward variable value
1 00AA City of London 00AA X1_2_Agriculture_mining_utilities 0.00
2 00ABFX Abbey 00AB X1_2_Agriculture_mining_utilities 0.00
3 00ABFY Alibon 00AB X1_2_Agriculture_mining_utilities 0.00
4 00ABFZ Becontree 00AB X1_2_Agriculture_mining_utilities 2.00
5 00ABGA Chadwell Heath 00AB X1_2_Agriculture_mining_utilities 0.30
6 00ABGB Eastbrook 00AB X1_2_Agriculture_mining_utilities 0.00

With the majority of our data processing complete we now need to prepare the shapefiles for plotting. Format conversion is achieved using fortify() which is part of the ggplot2 package. the _geom extension indicates that it contains the shapefile geometry.

london_geom <- fortify(londonShape, region="id")

With the shapefile fortified, we can now do the appropriate joins to bring our data back together. First, we join the melted borough data to the original shapefile attribute data. The attribute data is obtain by appending @data to the object. The second join merges the London_geometry and shapefile data.

londonShape@data <- merge(londonShape@data, LondonDataMelt, by.x="ons_label", by.y="ward")
 london_geom <- merge(london_geom, londonShape@data, by.x="id", by.y="id")

A small addition before creating the plot is the creation of labels. This is achieved by taking the mean of lat and long for each polygon. Though not a perfect solution it is fairly efficient. An alternative may be coordinates() in the sp package

burNames <- aggregate(cbind(long, lat) ~ ons_label, data=london_geom, FUN=mean)

Finally we can construct our plot. If you want to suppress the labelling simply insert a # in front of the geom_text line.

ggplot(london_geom, aes(x=long, y=lat)) +
 geom_polygon(data=london_geom, aes(group=group, fill=value)) +
 geom_text(data=burNames, aes(long, lat, label = ons_label, col="white") ,size=3) +
 scale_x_continuous("", breaks=NULL) + scale_y_continuous("", breaks=NULL)

The output should be something like this:

Maps of London Employment

This entry was posted in r-tutorial. Bookmark the permalink.