So, today I want to write about a topic I feel strongly about which is how we raise the next generation of computational biologists.
To start with, I think that in many ways we had made great progress compared to the state of affairs when I started my graduate studies: There is a much better understanding of what it is students should actually know, there are dedicated courses, books, online material, etc.
I also want to emphasize that I’m not advocating that computational students do not train/work in biomedical environments. Unless what you really want is only do CS/Math you may miss out *a lot* in terms of real life data/problems (domain specific data science if you will), how biologists think about problems (quite different I tell you, and there is a lot to learn there!), or thinking about the next set of problems/challenges to tackle. Not to mention cutting edge biomedical research you get exposed to can be absolutely fascinating even when no computational problems are involved!
But I’m not here to discuss all that but rather the not uncommon situation where computationally oriented students are basically used as in-house bioinformaticians to solve the bioinformatics needs of a data generating lab. And sure I understand it’s not a black or white and there is great value in getting your hands dirty with real data, and that it’s important to help each other, be a good citizen, etc. That’s not what I’m talking about. I’m talking about students with computational aspirations that end up doing all the bioinformatics work in the lab because (a) it’s really needed (b) they can (c) they are much cheaper and easier to get than a Bioinformatician. Sure, these students may end up on great papers representing great science from great labs. But I argue that’s not enough, and that can not be an excuse. Why? Because they come to *train* and it’s our responsibility to train them. And if you think that just by making them solve your Bioinformatics problems you are giving them proper computational training you are *wrong*: They will not necessarily develop the technical skills in algorithms, proper coding, data analysis, thinking about computational modeling and many more things they should be getting. And don’t tell me that the fact they are coming out to a market that will now snatch them is enough. Because if they have the proper training they can easily grow, do something else entirely, etc. But if they don’t then they are much more likely to get stuck at a lower level, not mature as independent compbio researchers that are sought after in Academia/Industry.
I should also mention a “lighter form” of negligence: When a PI gets a highly computational student but does not necessarily know how to guide her. With all the good intentions this results in “go explore and tell me what you may want to do.” It sounds great in theory, but the problem is that (a) these students commonly lack a strong biomedical base and (b) even if they are computationally savvy they don’t know how to actually translate something they hear/read about to a computationally framed problem. They often don’t even know what questions to ask.
Naturally, I meet many researchers during my work, and some PIs acknowledge the problem. I talked to one such senior PI in a meeting last summer who told me: “you are right. They are desperately needed in the labs, we try to make the best of it, but I know it’s not always good for them”. But not all are like that. I had a quite different exchange with another senior PI. During a social event, the conversation drifted to this, and I said it’s a problem we need to deal with. She said it’s totally fine (using the argument above about having job offers). I iterated our obligation to train them properly computationally and that otherwise we are not doing it right. At which point she said, jokingly, “Well, you are lucky I’m not on your tenure committee.” I could not agree more, and a joke or not I don’t like that kind of humor . Regardless, I see that “everything is fine” answer a representative of a too common approach in biomedical research labs.
So what should we do? There are several things I can think of:
- As an institution/graduate program: Make the effort to have computational students be advised properly. So if the PI is not up to it/interested, get a co-advisor  and make sure the computational skills development is on the student’s todo list.
- As a student:
- Same as (1) above regarding skill development and/or co-advising.
- Think carefully before that in what institute/program/lab you want to spend your time. Think what it is you actually want, ask questions, shop around. Maybe do research in a lab for a year to get the hang of it and see for yourself before you commit for 5 or so years.
- Be Proactive – do not just count on your program/mentor/whatever to take good care of you/your interests. Maybe your interests are not their or not high enough on their priority, or they are just too busy or do not know any better. We are brought up in a system where we follow what the teachers tell us, get good grades, and constantly look for their approval. Ph.D. students are in a period where they are still training but also transitioning towards independence, the job market, etc. You should still focus on doing a great job, but don’t follow blindly everything else.
The above points also relate to some of my previous posts about finding yourself a good mentor (or Sensei…). At the very least if we all become more aware of this issue I think there is a good chance of improving the upbringing of our future generation of scientists.
So, it seems this post got a lot of views but was also misinterpreted by some who got back to me with legitimate concerns and criticism. Specifically, a senior PI wrote me they read this as “data generation labs are exploiting the students”. That was never my intention. Let me clarify, and I’ll use Penn’s GCB graduate group to make the point. GCB stands for “Genomics and Computational Biology”. I think the creators of GCB were wise to define it as such. It means GCB caters to a wide range of students who want to get exposed to “real life data/problems”. Some are more into methods development to derive hypotheses (hence “Computational Biology”), others are more into actually generating the data and analyzing it themselves (hence “Genomics”). These are crude distinctions of course but the point is not every student is interested in methods development, not every student requires co-advising. And Sometimes a student may need co-advising/collaboration for a specific project/problem and that’s all. As the PI rightfully wrote me “there is no one size fits all”. Indeed. And students that are becoming experts in a certain field while using/producing Genomic data are not “exploited.” As that PI wrote me: “I’d be better off hiring a good bioinformatician then taking on an untrained grad student who typically needs close supervision and mentorship.” That’s a fair point. My worry, and what sparked this blog in the first place, is with students who want to do more “methods development” at some level and do not get to do that because (a) they haven’t realized that’s what they actually want to do (b) they did not articulate it (see my suggestions above) (c) the system/lab they are in does not support it.
 This reminded me of a joke my father always liked to tell when I was little: Two guys cross each other on the street. The big guy suddenly slaps the little guy out of nowhere. The little guy looks at him intensely and says: “What was that? Was that a joke or something?” To which the big guy replies: “No, I was serious.” “Oh, you’re lucky then,” says the little guy, “because I really don’t like that kind of humor.”
 Co-advising is a solution used in GCB [Genomics and Computational Biology] here at Penn. I was fortunate to be co-mentored through some of my PhD and it was instrumental during my postdoc years.