One such journal does did not. New England Journal of Medicine (Figure 1).
At the end of their statement, NEJM seeks to comfort authors by assuring them that, “Most medical journals have similar [no pre-print] rules in place”.
Using R
, Wikipedia’s List of Medical Journals, and the SHERPA/RoMEO database, we can empirically show this statement to be false.
Defining “most”
Merriam-Webster defines the word “most” as:
2 :the majority of
Defining the problem
Using the M-W definition, we’re going to show that “the majority of” medical journals do not have the same strict no-pre-print policy. That is, given a comprehensive list of medical journals, the majority of them will have more lenient pre-print policies than NEJM.
We will operationalize this with the SHERPA/RoMEO categorization:
- Green. Can archive pre-print and post-print or publisher’s version/PDF.
- Blue. Can archive post-print (i.e., final draft post-refereeing) or publisher’s version/PDF
- Yellow. Can archive pre-print (i.e., pre-refereeing).
- White. Archiving not formally supported.
- Gray. Unknown.
NEJM is RoMEO white. We will show that more than 50% of journals are RoMEO green or yellow.
A list of medical journals
Using the rvest
package in R
, we can quickly scrape the Wikipedia page of medical journals and extract the relevant table:
library(rvest) library(tidyverse) list_url <- "https://en.wikipedia.org/wiki/List_of_medical_journals" list_df <- list_url %>% read_html() %>% html_nodes("table") %>% html_table(fill = TRUE) %>% .[[1]]
<span style="font-family: Georgia, 'Times New Roman', 'Bitstream Charter', Times, serif;">This returns this output:</span>
> glimpse(list_df) Observations: 308 Variables: 5 $ Name <chr> "Academic Medicine", "ACIMED", "Acta Anaesthesiolo... $ Specialty <chr> "Academic medicine", "Medical informatics", "Anaes... $ Publisher <chr> "Association of American Medical Colleges", "Natio... $ English <chr> "English", "Spanish", "English", "Portuguese", "En... $ `Publication Dates` <chr> "1926-present", "1993-present", "1957-present", "1...
The table has the name of the journal, the medical specialty, the publisher, the language, and the journal dates.
Getting pre-print data
Unlike NEJM, SHERPA/RoMEO is completely free, open, and provides a useful API. Using this API, we will loop through all of our journals from the list above and find the RoMEO color for each one.
## Make new columns (store ISSN for verification in future) romeo_df <- list_df %>% mutate(romeo = NA, issn = NA, api_outcome = NA) for (i in seq_along(romeo_df$Name)) { print(romeo_df$Name[i]) api_url <- "http://www.sherpa.ac.uk/romeo/api29.php?jtitle=" api_req <- gsub(" ", "%20", sprintf("%s%s", api_url, romeo_df$Name[i])) request <- read_xml(api_req) temp_issn <- request %>% xml_node("issn") %>% xml_text() temp_color <- request %>% xml_node("romeocolour") %>% xml_text() romeo_df$issn[i] <- ifelse(length(temp_issn > 0), temp_issn, NA) romeo_df$romeo[i] <- ifelse(length(temp_color > 0), temp_color, NA) romeo_df$api_outcome <- request %>% xml_node("outcome") %>% xml_text() }
Now we have a new dataframe with RoMEO color (and a couple other variables):
> glimpse(romeo_df, width = 79) Observations: 308 Variables: 8 $ Name <chr> "Academic Medicine", "ACIMED", "Acta Anaesthes... $ Specialty <chr> "Academic medicine", "Medical informatics", "A... $ Publisher <chr> "Association of American Medical Colleges", "N... $ English <chr> "English", "Spanish", "English", "Portuguese",... $ `Publication Dates` <chr> "1926-present", "1993-present", "1957-present"... $ romeo <chr> "yellow", NA, "yellow", "green", "yellow", "gr... $ issn <chr> "1040-2446", NA, "0001-5172", "0870-399X", "00... $ api_outcome <chr> "singleJournal", "singleJournal", "singleJourn...
Results
Of the 308 medical journals in our list, do more than half have RoMEO colors green or yellow?
> mean(romeo_df$romeo %in% c("green", "yellow")) > .5 [1] TRUE
Yes.
The full table is here:
> table(romeo_df$romeo, useNA = "always") blue gray green white yellow <NA> 17 16 119 28 61 67
Thus, NEJM‘s pre-print statement is empirically false. It is possible to get a more accurate estimate of the number of medical journals that allow pre-prints; however, even taking the most conservative approach and allowing that all unknown (5.2%; N=15) or missing values (21.3%; N=67) have the same pre-print policy as NEJM, we show that their statement is factually incorrect.
Conclusion
While NEJM is obviously entitled to hold whatever pre-publication policy they want, suggesting that “most medical journals have similar policies” is verifiably incorrect. Even when taking the most conservative possible estimate (i.e., all NA
values are assigned values against the null), NEJM‘s statement is unjustified. Further, other high impact journals have adopted lenient common sense pre-print policies. One can only dream that they will update their editorial response or perhaps update their discussion (last revision was in 1991). Perhaps at the very least, NEJM can update their pre-publication policy page to remove their factually incorrect statement.