Use bash to concatenate files in R

Often, I find I need to loop through directories full of csv files, sometimes tens of thousands of them, in order to combine them into a single analytical dataset I can use. When it’s only a few dozen, using fread(), read_csv, or the like can be fine, but nothing is quite as fast as using awk or cat.

Here’s a snippet of code that allows one to use bash in R to concatenate csv files in a directory. People in the lab have found it helpful so maybe others will as well.

It can obviously be modified for any type of file. I have no evidence to back this up, but in a typical use case, I get at least 100 times speedup compared to fread() or read_csv loops. Reading in one big file is almost always faster than reading in thousands of little ones and reallocating more memory as you go.

Leave a Reply

Your email address will not be published. Required fields are marked *