{"id":1365,"date":"2017-01-16T21:11:31","date_gmt":"2017-01-17T02:11:31","guid":{"rendered":"https:\/\/mathewkiang.com\/?p=1365"},"modified":"2020-11-21T17:33:38","modified_gmt":"2020-11-21T22:33:38","slug":"using-histogram-legend-choropleths","status":"publish","type":"post","link":"https:\/\/mathewkiangcom.local\/2017\/01\/16\/using-histogram-legend-choropleths\/","title":{"rendered":"Using a histogram as a legend in choropleths"},"content":{"rendered":"

\"\"D<\/span>espite well known drawbacks,1<\/a><\/sup> plotting parameters onto maps provides a convenient way of seeing context, patterns, and outliers. However, one of\u00a0the many problems with choropleths is that the area of the regions tend to distort our perception of the value of the region. For example, in the United States, huge (in terms of land mass) counties will tend to have a greater visual impact than small counties (despite often having similar or even smaller population sizes).<\/p>\n

One way to address this is to use a histogram as a legend on your map. The histogram then provides you with a way of showing raw counts of equal weights while the map allows you to provide the spatial context of the values.<\/p>\n

If you know how to make a choropleth already and just want the to know how to add the histogram, scroll to the bottom. Otherwise, we’re going to make the map above from scratch.<\/p>\n

1. Get, load, and prep the data<\/h3>\n

You can download the drug deaths\u00a0data from the CDC blog post here<\/a>. We will need two shape files — the 2013 county CB shape file<\/a> and the 2013 state CB shape file<\/a>.<\/p>\n

## Download the drug death data from:\r\n## https:\/\/blogs.cdc.gov\/nchs-data-visualization\/drug-poisoning-mortality\/\r\n## \r\n## Download the 2013 CB shapefiles (500k will do) for counties and states:\r\n## https:\/\/www.census.gov\/geo\/maps-data\/data\/cbf\/cbf_counties.html\r\n## https:\/\/www.census.gov\/geo\/maps-data\/data\/cbf\/cbf_state.html\r\n\r\n## Imports ----\r\nlibrary(tidyverse)\r\nlibrary(broom)\r\nlibrary(rgdal)\r\nlibrary(viridis)\r\nlibrary(grid)\r\n\r\n## Lower 48 states ----\r\nlower_48 <-  c("09", "23", "25", "33", "44", "50", "34", "36", "42", "18", \r\n               "17", "26", "39", "55", "19", "20", "27", "29", "31", "38", \r\n               "46", "10", "11", "12", "13", "24", "37", "45", "51", "54", \r\n               "01", "21", "28", "47", "05", "22", "40", "48", "04", "08", \r\n               "16", "35", "30", "49", "32", "56", "06", "41", "53")\r\n\r\n\r\n## Load drug poisoning data ----\r\ncounty_df <- read_csv(paste0('.\/..\/original_data\/NCHS_-_Drug_' , \r\n                             'Poisoning_Mortality__County_Trends', \r\n                             '__United_States__1999_2014.csv'), \r\n                      col_names = c('fipschar', 'year', 'state', 'st', \r\n                                    'stfips', 'county', 'pop', 'smr'), \r\n                      skip = 1)\r\ncounty_df$smr <- factor(county_df$smr, \r\n                        levels = unique(county_df$smr)[c(10, 1:9, 11)], \r\n                        ordered = TRUE)\r\n\r\n\r\n##  Import US states and counties from 2013 CB shapefile ----\r\n##  We could use map_data() -- but want this to be generalizable to all shp.\r\nallcounties <- readOGR(dsn = ".\/..\/shapefiles\/", \r\n                       layer = "cb_2013_us_county_500k")\r\nallstates   <- readOGR(dsn = ".\/..\/shapefiles\/", \r\n                       layer = "cb_2013_us_state_500k")\r\n\r\n\r\n## A little munging and subsetting for maps ----\r\nallcounties@data$fipschar <- as.character(allcounties@data$GEOID)\r\nallcounties@data$state <- as.character(allcounties@data$STATEFP)\r\nallstates@data$state <- as.character(allstates@data$STATEFP)\r\n\r\n## Only use lower 48 states\r\nsubcounties <- subset(allcounties, allcounties@data$state %in% lower_48)\r\nsubstates <- subset(allstates, allstates@data$state %in% lower_48)\r\n\r\n## Fortify into dataframes\r\nsubcounties_df <- tidy(subcounties, region = "GEOID")\r\nsubstates_df <- tidy(substates, region = "GEOID")<\/pre>\n

The code is pretty straightforward. We are just loading up the data and doing some minor tweaks like changing the levels of the death rates so they are in order. The tidy()<\/code>\u00a0command is just the new version of fortify()<\/code>\u00a0which changes spatial data into a dataframe that ggplot2<\/code>\u00a0can read and use.2<\/a><\/sup><\/p>\n

2. Make a basic map<\/h3>\n

Let’s subset to a single year and make a map. Back in my day you had to do a double merge to get the data in order and for counties (or even worse, census tracts), it could take forever. Now,\u00a0we can use geom_map()<\/code> though it still appears to take forever.<\/p>\n

## Let's focus on one year and do a practice run\r\ndf_2014 <- filter(county_df, year == 2014)\r\n\r\nmap_2014 <- ggplot(data = df_2014) + \r\n    geom_map(aes(map_id = fipschar, fill = smr), map = subcounties_df)  + \r\n    expand_limits(x = subcounties_df$long, y = subcounties_df$lat)\r\nggsave(map_2014, file = '.\/first_pass.jpg', width = 5, height = 3.5)<\/pre>\n

You end up with something like this:<\/p>\n

\"\"<\/a><\/p>\n

It’s… well it’s a map and, quite frankly, not the ugliest map I’ve ever made.<\/p>\n

2b. Better colors, less empty space, and better base theme.<\/h3>\n

I prefer theme_classic()<\/code>\u00a0as my base since it gets rid of most of the things I don’t want anyways. And I often waffle between ColorBrewer<\/a> and Viridis<\/a>, but let’s go with viridis<\/code> here.<\/p>\n

map_2014 <- map_2014 + \r\n    theme_classic() + \r\n    scale_fill_viridis(option = "inferno", discrete = TRUE, direction = -1) + \r\n    scale_x_continuous(expand = c(0, 0)) + \r\n    scale_y_continuous(expand = c(0, 0)) \r\nggsave(map_2014, file = '.\/better_colors.jpg', width = 5, height = 3.5)<\/pre>\n

\"\"<\/a><\/p>\n

2c. Remove lines we don’t need. Add state lines. Put in some useful text.<\/h3>\n

Now we’ll use the state CB file (you didn’t forget about it, did you?) to add in state lines. Helpful for providing more context in the US. I also think the title font is too small by default so we’ll make it bigger and we’ll source the original data. Also, I find a 1.35<\/code>\u00a0ratio to be the most pleasing for the US maps in ggplot2<\/code>\u00a0so I set the coord_fixed()<\/code>\u00a0to that, but not everybody likes it. Lastly, because of the larger font, we’ll need a bigger scaling factor on the ggsave()<\/code>.<\/p>\n

map_2014 <- map_2014 + \r\n    geom_path(data = substates_df, aes(long, lat, group = group), \r\n          color = "white", size = .7, alpha = .75) + \r\n    theme(plot.title = element_text(size = 30, face = "bold"), \r\n          axis.line = element_blank(),\r\n          axis.text.x = element_blank(),\r\n          axis.text.y = element_blank(),\r\n          axis.ticks = element_blank(),\r\n          axis.title.x = element_blank(),\r\n          axis.title.y = element_blank(), \r\n          legend.position = "none") + \r\n    labs(title = "Drug poisoning deaths (2014)", \r\n         caption = paste0('Source: https:\/\/blogs.cdc.gov\/', \r\n                          'nchs-data-visualization\/', \r\n                          'drug-poisoning-mortality\/')) + \r\n    coord_fixed(ratio = 1.35)\r\nggsave(map_2014, file = '.\/getting_closer.jpg', width = 5, height = 3.5, \r\n       scale = 2)<\/pre>\n

\"\"<\/h3>\n

3. First pass on the legend<\/h3>\n

Now, let’s work on the legend. Instead of using geom_hist()<\/code>, we’re going to use geom_bar()<\/code>\u00a0so we need to aggreagate the data and get the number of counties for each level of drug death rate. Then, just call it like any normal bar plot, but with stat = 'identity'<\/code>\u00a0and fill in the color scheme we’ve already decided on to match the map.<\/p>\n

##  First, aggregate the data\r\nagg_2014 <- df_2014 %>% \r\n    group_by(smr) %>%\r\n    summarize(n_county = n())\r\n\r\n## Legend, first pass ---- \r\nlegend_plot <- ggplot(data = agg_2014, \r\n                      aes(y = n_county, x = smr, fill = smr)) + \r\n    geom_bar(stat = 'identity') + \r\n    scale_fill_viridis(option = "inferno", discrete = TRUE, direction = -1) \r\nggsave(legend_plot, file = '.\/legend_first.jpg', \r\n       height = 2, width = 1.75, scale = 3)<\/pre>\n

 <\/p>\n

\"\"<\/a><\/p>\n

Ok. So the legend can use some work. The first thing is that we need to fix the y-axis so that it is consistent across all years. Just looking for the maximum value shows us [0, 900] is a good range. Next, we need to flip the coordinates so the bars are horizontal \u2014 however, if we do that, the black bar will be on top and the yellow on the bottom so we also need to flip the order of the factors so that they match the legend.<\/p>\n

3b. Let’s reverse the order of the factors and then flip coordinates.<\/h3>\n
legend_plot <- legend_plot + \r\n    scale_y_continuous("", expand = c(0, 0), breaks = seq(0, 900, 300), \r\n                       limits = c(0, 900)) + \r\n    scale_x_discrete(limits = rev(levels(df_2014$smr))) + \r\n    coord_flip()\r\nggsave(legend_plot, file = '.\/proper_orientation.jpg', \r\n       height = 2, width = 1.75, scale = 3)<\/pre>\n

\"\"<\/a><\/p>\n

Much closer. Now, let’s add some vertical lines, remove the ink we don’t need, and add some useful text.<\/p>\n

legend_plot <- legend_plot + \r\n    theme(plot.title = element_text(size = 16, face = "bold"), \r\n          plot.subtitle = element_text(size = 15), \r\n          axis.text.y = element_text(size = 14), \r\n          axis.text.x = element_text(size = 12),\r\n          axis.line = element_blank(),\r\n          axis.ticks = element_blank(),\r\n          axis.title.y = element_blank(), \r\n          legend.position = "none",\r\n          panel.background = element_rect(fill = "transparent",colour = NA),\r\n          plot.background = element_rect(fill = "transparent",colour = NA)) + \r\n    geom_hline(yintercept = seq(0, 900, 150), color = "white") + \r\n    labs(title = "Legend", subtitle = " Deaths per 100,000")\r\nggsave(legend_plot, file = '.\/final_legend.jpg', \r\n       height = 2, width = 1.75, scale = 3)<\/pre>\n

\"\"<\/a><\/p>\n

4. Now put the legend on the map.<\/h3>\n

Use ggplotGrob()<\/code>\u00a0to render the legend and then place the legend on the map using annotation_custom()<\/code>. You’ll have to play with the alignment manually since it changes depending on the size and scale of your final product. Here, I aligned things to the lower right by setting ymin<\/code>\u00a0and xmax<\/code>\u00a0to the lower right corner and offset everything from there. The size of this viewport dictates the size of your\u00a0legend so play around.<\/p>\n

legend_grob = ggplotGrob(legend_plot)\r\n\r\nfinal_plot <- map_2014 + annotation_custom(grob = legend_grob, \r\n                                    xmin = max(subcounties_df$long) - 11, \r\n                                    xmax = max(subcounties_df$long) - 1, \r\n                                    ymin = min(subcounties_df$lat), \r\n                                    ymax = min(subcounties_df$lat) + 9)\r\n\r\nggsave(final_plot, width = 8, height = 5, scale = 2, limitsize = FALSE, \r\n       file = '.\/jpeg_drug_deaths_2014.jpg')<\/pre>\n

The final plot is the one you see at the top of the post.<\/p>\n

I also created a loop and looped through all the years. A slideshow is here (you may have to use the slow domain<\/a> or use the standalone viewer<\/a>):<\/p>\n

<\/div>\n

As always, full code can be found here<\/a>.<\/p>\n

\n
<\/div>\n

Show 2 footnotes<\/span><\/a><\/p>\n

    \n
  1. E.g., Gelman and Price 1999<\/a> or How to Lie with Maps<\/em> by Mark Monmonier ↩<\/a><\/span><\/li>\n
  2. Though a glance at the source code seems to suggest it is identical. ↩<\/a><\/span><\/li>\n<\/ol>\n<\/div>\n","protected":false},"excerpt":{"rendered":"

    espite well known drawbacks,1 plotting parameters onto maps provides a convenient way of seeing context, patterns, and outliers. However, one of\u00a0the many problems with choropleths is that the area of the regions tend to distort our perception of the value of the region. For example, in the United States, huge (in terms of land mass) counties will tend to have a greater visual impact than small counties (despite often having similar or even smaller population sizes). One way to address this is to use a histogram as a legend on your map. The histogram then provides you with a way …<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[132,116],"tags":[],"_links":{"self":[{"href":"https:\/\/mathewkiangcom.local\/wp-json\/wp\/v2\/posts\/1365"}],"collection":[{"href":"https:\/\/mathewkiangcom.local\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mathewkiangcom.local\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mathewkiangcom.local\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mathewkiangcom.local\/wp-json\/wp\/v2\/comments?post=1365"}],"version-history":[{"count":6,"href":"https:\/\/mathewkiangcom.local\/wp-json\/wp\/v2\/posts\/1365\/revisions"}],"predecessor-version":[{"id":2127,"href":"https:\/\/mathewkiangcom.local\/wp-json\/wp\/v2\/posts\/1365\/revisions\/2127"}],"wp:attachment":[{"href":"https:\/\/mathewkiangcom.local\/wp-json\/wp\/v2\/media?parent=1365"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mathewkiangcom.local\/wp-json\/wp\/v2\/categories?post=1365"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mathewkiangcom.local\/wp-json\/wp\/v2\/tags?post=1365"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}