Should I (Bio)Rxiv?

“So what is your view on BioRxiv?” asked me Donny Licatalosi last week on one of those long nights over beers at CSH RNA Processing meeting (which, btw, is a *terrific* meeting I highly recommend). Donny is one of the bright new PIs in the RNA field (see here), and I believe his question represents many other young PIs in a similar stage, trying to decide whether to jump on the archive wagon. After laying out my views on Rxiv pros and cons (with the help of more beer, of course…) we agreed I should probably do what I said I’d do a long time ago, and actually write about this in my blog, so here goes….

For the record, I should state that the four last papers from my lab have all been posted on BioRxiv. This already tells you I am generally in favor so I’ll start with the benefits, at the risk of stating the obvious:

  1. There are ideological points behind it which are worth supporting:
    1. Make science progress faster by increasing communication.
    2. Open access science. Give everyone equal access to cutting-edge science, whether you’re a curious student in a rural countryside or you’re a tenure-track professor in a large research institution.
    3. Help change the current landscape of publishing. This system is bound to change, and BioRXiv can be seen as a catalyst of this change. Think of Taxis: We needed those to get around because cars were expensive, unreliable, and required expert knowledge (driving and getting around). Now we have cheap, reliable cars with GPS, cars/driving are common, and we are all connected via smartphones – so now we have Uber/Lyft. The publishing landscape is just as archaic, with us scientist doing all the editorial work for free, feeding private companies billions of dollars of taxpayer money to do jobs which are for the most part no longer needed (publishing/distribution).
  2. There are practical reasons/cases where it can benefit you:
    1. Spread the word about your advances/get faster recognition, i.e. the personal version of points 1.1/2. This may be particularly useful when you are a young PI, starting your lab takes a long time, and you carry little weight with editors. This is, of course, a hypothetical, totally fictional situation. Any resemblance to real life characters is totally coincidental.
    2. Get a citable point of reference in a grant/related paper. While reviewers of your grant can ignore it they may also have read it or at least acknowledge it as proof-of-concept/progress (sure beats writing “manuscript in preparation”…). It may also help/not hurt when you are submitting your tenure package (another totally fictional scenario).
    3. Can help lay claim to new results/avoid scooping/possibly IP problems – this may be particularly relevant to situations where you know someone else is trying to beat you to it. By posting it on BioRxiv you clearly put it in the public domain (not patentable by others).
    4. You could get valuable feedback from the community (I think this is still in it’s infancy, but there is potential)
    5. You could possibly get some points for being a good citizen, and really why not? Which brings me to the next part….

If you ask my milieu of young PI’s such as the color Caseys (the Brown and the Greene) the answer would be a resonating “you should do it!”. It’s definitely a common practice in the Genomics and Machine Learning fields – two communities I belong to. But it’s not a common practice in Biology such as in the RNA field which I’m also a (proud) member of (with notable exceptions such as Brent Gravely, also being very “Genomics”). So, for me, the more interesting part is maybe the arguments against preprints:

  1. The big supporters of archive papers like to raise claims in the spirit of: “The only difference between a published and archive paper is that 3 more people read it”. I’m sorry, but that’s just not true. In a properly functioning system (I know, I know…) these 3 are not random people but independent experts in your field. That’s huge. And because you know that, you put much more effort into it. You are also held accountable for the content (in a proper system, I know, I know…). So yes, the current peer review system is problematic and definitely requires serious amendments, but I don’t like the claim it’s just the same. We are getting to a point where some even make use of this claim. One researcher told me his postdoc did not bother to publish a method paper and moved on to other projects because the method was already used for X (X being the important project they needed it for), and “people are using it already, so why bother”. Well, for one, if the method was not validated/tested in the original archive paper, there is no way to enforce it (or evaluate the consequent results/claims). I am definitely sympathetic to constraints such as the timing of projects etc. but the result of this approach, even when not intentional, is problematic for our field. Which leads me to the next point.
  2. Researchers, whether intentionally or not, sometimes abuse the archive system. I have seen multiple cases where a paper is submitted/published completely relying on an unpublished method only available as an archive paper. In theory, this is still legit if reviewers get all the info needed. Indeed, sometimes it is of little relevance – e.g. you use a method to find something and then you validate it experimentally. In such cases, the original detection method is almost irrelevant as the result holds. But in some cases, main results rely on a separate pre print, which can be problematic. First, this puts an unfair load on the reviewers which, in order to do a proper job, are now required to review two papers (and that second one may not even fit their expertise). Consequently, the entire premise of the paper being reviewed may be wrong. The archive methods paper may lack validations, proper evaluation of FDR etc. because after all, it’s only an archive paper (see point 1 above….). And editors are to blame for this as well because they allow this to happen and play along. Again, I totally understand that project/paper timing is an issue, but things need to be done properly in order for us to trust papers, and this new practice is not making things better. Finally, another version of abuse is a form of “salami publishing,” where a whole salami (scientific story) posted on BioRxiv is sliced into pieces that the authors try to publish separately even though these are highly dependent, without proper acknowledgment.
  3. While the above two points are basically me ranting about possible pre prints abuse I think the following is a more interesting point to consider personally: Biomedical research papers are inherently different from those of CS/Math. In CS if you have a good idea you write it down and can put it out there quickly via arXiv, soon to be followed by a matching (quick, short) conference paper. In biology, time constants are usually larger and the paper can change dramatically between the first submission and the final version as the story is driven first and foremost by the results (as opposed to by the math/model in typical Math/CS papers). A senior PI told me recently she does not like the archive craze because she does not want to be known by those initial versions. Think: How many people who read your initial paper bother to read the final one when it comes out??
  4. For a Bioinformatics method developer, the combination of making things publically available as quickly as possible yet still wanting a traditional publication (for grants, tenure etc.) may be lethal: You post your paper + method, but by the time your paper is actually reviewed someone else already shows their method is much better. Now go publish that as a significant contribution…..
  5. You might get scooped: I wrote this as the last point because I think this may be an overrated concern for biologists against using bioRxiv, but it still exists. In general, I think this applies if you have something very cool in a hot topic where simply posting the finding may lead specific people whom you do not trust to quickly replicate & submit while your own paper is being delayed (intentionally or not). Notice there are a lot of conditions in the previous sentence which you need to think whether (a) they hold (b) whether these people would do that given that the rest of the world and their lab (may) know about your archive paper. Still, this is a clear case where biology and CS differ as the former is (a) much more discovery/finding oriented (b) development time (c) review time are generally longer and more variable.

The mistrust in archive papers has already garnered some attention in blog posts and twitter feeds – see for example discussions here and here, the latter leading to the provocative title to Boycott BioRxiv (to be clear if you actually read it, it does not really call for it). Suffice to say there is clear evidence supporting a healthy dose of mistrust, regardless of who the author is and where it finally ends up. So one clear take home message is that we should teach ourselves and our students exactly that.

But beyond mistrust, I think there is something to be said about well-thought reviews and paper write ups, taking the time for your work to mature. I definitely see the great value of archive papers and I used it myself (see opening paragraph) but somehow it also represents for me something of the current times, where people are in constant search for instant gratification as the number one priority/value. Or maybe I’m just getting old. In any case, I think that at least considering the perils listed above and avoiding the abuse of the archive system is worthwhile for all of us as we try to advance Science and make our environment a better place.

I’d like to finish with two points. First, I think that preprints are part of a more general question about the future of Scientific publishing which I hope to cover a bit in my next blog post.

Second, there are also interesting questions regarding how should pre prints be treated. For example, there was a lot of discussion about whether these should be included in NIH grant applications and how (e.g here). Also, how should you as an author/reviewer/editor treat those? Are you expected to know about those? Compare/cite those? What should an editor do if she knows about a similar paper already on BioRxiv? In NIPS, for example, we had to declare as reviewers whether we already saw the paper on arXiv (NIPS is double blind). Given the timelines and emphasis on discovery in biomedical research (see above) it’s likely these fields require different approaches than CS. I would love to see people address those questions in the comments below, and maybe these should form another blog post too.

In summary, as I told Donny that night at CSH, I see BioRxiv as a tool in our toolset – be aware of your options and think given the above whether this is the right tool for that specific work/paper. Good luck with it and may the force be with you!


Update 9/9/2017:

  1. Here is a paper that does a nice job comparing the effect of enforced, voluntary, and no submissions to archive on subsequent paper citations. This serves to put the numbers behind positive point 2.1 I listed above. As it happens a different paper with a much more catchy message (500% increase in citations) but no such control for confounding factors appeared as an archived paper and created the consequent buzz – nicely illustrating some of the other points I was trying to make….
  2. Brent Gravely responded to my blog (here) stating all the positive effects posting on BioRxiv had for him (including reversing an editor decision!) and stating no negatives occurred. This is, of course, a case of proof by example, but still, a nice illustrations of the benefits to be had.