Using a histogram as a legend in choropleths

Despite well known drawbacks,1 plotting parameters onto maps provides a convenient way of seeing context, patterns, and outliers. However, one of the many problems with choropleths is that the area of the regions tend to distort our perception of the value of the region. For example, in the United States, huge (in terms of land mass) counties will tend to have a greater visual impact than small counties (despite often having similar or even smaller population sizes).

One way to address this is to use a histogram as a legend on your map. The histogram then provides you with a way of showing raw counts of equal weights while the map allows you to provide the spatial context of the values.

Read More

Show 1 footnote

  1. E.g., Gelman and Price 1999 or How to Lie with Maps by Mark Monmonier

Getting SSL certificates on GoDaddy Shared Hosting plans

Since Google’s announcement that they will start publicly shaming unsecured websites in January 2017, everybody has been rushing to try to get their https tags. I’ve also been getting relentless phone calls from GoDaddy salespeople asking me to buy SSL certificates for about $5 per month. I’m stereotypically Asian so $5 per month on a personal blog just seems excessive. I’m not here trying to sell things or get your credit card information — SSL is a nice-to-have-but-not-$5-per-month-nice-to-have item.

Turns out, creating and making free SSL certificates is not that hard thanks to the good people at EFF. There’s a very helpful blog post by Isabel Castillo that outlines how to do it. Some issues I ran into and their solutions:

Read More

Use bash to concatenate files in R

Often, I find I need to loop through directories full of csv files, sometimes tens of thousands of them, in order to combine them into a single analytical dataset I can use. When it’s only a few dozen, using fread(), read_csv, or the like can be fine, but nothing is quite as fast as using awk or cat.

Here’s a snippet of code that allows one to use bash in R to concatenate csv files in a directory. People in the lab have found it helpful so maybe others will as well.

Read More

A visual tour of my publications

I recently came across this paper by Michal Brzezinski about (the lack of) power laws in citation distributions. It made me a little curious about the citations of my own articles so I threw together a little script using James Keirstead’s Scholar package for R. In the plot above, every line represents a single article with time on the x-axis and (cumulative) number of citations on the y-axis.

It’s not super informative, so we can break it down a few ways to graphically explore the data.

Read More