The Art in Science – Part III: “What problem should I choose to work on?”

In part three of what has become my mini series of muses about the “Art in Science” I wanted to get back to the more general question I mentioned in Part I: “What problem should I choose to work on?”.

First, a humble disclaimer is in order given my junior position. Take everything below with a grain of salt. Obviously YMMV and my perspective if that of a computational biologist growing up in a computer science/machine learning environment. With that said, here are a few observations I made during my years in and outside Academia:

  1. The answer to the question really depends on the stage of your career: If you are a PhD student then you are very much in a *training* phase. Thus, I find the exact topic you work on during your PhD is less crucial. More important is getting a good base in terms of computational know-how and research approach. You want to get good papers that showcase your capabilities and you want your advisor to be respected and connected to help you with your next step, whether that’s in Academia or Industry.
  2. When you are a postdoc good capabilities are just not enough. As many have noted in today’s environment you simply can not get passed the initial screening of recruiting committees without that high impact paper(s). So how do you do that? Tuuli Lappalainen recently wrote a nice commentary about transitioning to tenure-track positions [1]. Her advice is to “Try to figure out what is the next big thing within your broader field, and get into a pioneering lab that is doing it right now”. Indeed, that approach can help you get that high impact paper but moreover it can help position you as an attractive faculty candidate, an expert in a hot new field. After all, science is driven by people and hence bound to have its fashions as well. In Tuuli’s experience, the high impact field was functional population genomics. My experience involved computational modeling of RNA processing but I admit I did not think in terms of optimizing for the next big thing. There are several things I would note about this: First, to figure out what is the next best thing you should shop around, ask people whose advice you value, and keep an open mind in the process. Second, the above statement can be erroneously interpreted as finding “an” area. My experience has been that, especially as a computationally skilled person, there are actually many interesting things you can work on. Thus, finding an environment that will make you flourish is just as important if not more. In fact, if you join such an environment there are much higher chances that you will land that major paper or develop a completely new area (and “own” it) even if it’s not exactly what you originally set out to do. Science in that sense is not unlike startups. NYT reporter Randall Stross, who studied the famous startup accelerator Y Combinator (think Dropbox, Airbnb), claims that one of its distinct characteristics is its focus on people, letting them explore as in (yes..) grad school, instead of closely watching/telling them what to do. Randall claims in Y Combinator the initial ideas are considered less crucial as the original idea is frequently abandoned. Instead, it’s the people that matter and the iterative process of evaluating and refining their ideas. Getting back to scientific research – What environment would make you flourish naturally depends on your character and interest, but at least keep this in mind instead of simply focusing on finding “a” topic. Finally, there is a point to be made about serendipity, scientific curiosity, and basic research. Who could have anticipated how we would get CRISPR technology and its effect on current research? For science to progress we need to hedge our bets. If we all focus on “the next big thing” we are more likely to actually miss it. Besides, many people may actually not respond well to working on a hot topic with intense competition. Thus the optimal setting is left for each individual to figure out for themselves.
  3. As a young PI figuring out what you want to work on is just as important if not more. So is that nurturing environment. This is especially true in the area of computational biology which tends to be highly collaborative – tackling a cool topic/question can be so much harder without good people to work with. As a young PI, you are also likely to get into the related problem “What problem I should NOT work on?” – this happens when you have too many ideas and too many suggestions for collaborations. Even if they are all great you still have limited resources – funds, time, energy, computing power, people. So you need to prioritize and learn to say no (how to say no is probably another form of art…). Think of it as rounds in your magazine. You only have a few, so think carefully what you aim for and make those bullets count.

In summary, the above points can be seen as general guidelines but to us scientists they offer no exact, deterministic, formula that we can apply to solve the problem. That’s why choosing what work to on can be seen as part of the “Art in Science”. When transitioning to tenure track, Tuuli writes that success is “a mixture of hard work, support, luck, strategy, persistence, talent and personality.” I liked her list. Indeed, anyone who successfully deals with “What to work on?” should be humble enough to admit that there is an element of luck involved. But you do not control luck so focus instead on what you do control. I really like Pasteur’s assertion: “Luck favors the prepared mind.” And Richard Hamming (I highly recommend reading [2]) added: “The particular thing you do is luck, but that you do something is not. The prepared mind sooner or later finds something important and does it”. Now, all we have to do is simply implement this… ūüėČ

[1] From trainee to tenure-track: ten tips, Lappalainen Tuuli, Genome Biology 2015
[2] You and Your Research, Richard W. Hamming, Transcription of the Bell Communications Research Colloquium Seminar, 1986

The Art in Science Part II: How to build models for real life problems?

In my previous post I explored some art aspects of scientific work that have to do with esthetics, creativity, and self expression. Another “art” aspect of ML is what kind of model/algorithm should we build for a given real life problem. There is no specific formula/recipe for that and like many things in life getting good at modeling takes time and practice, making it more of an “art” as referred to in Neil Lawrence’s post. Nonetheless, just as in Martial Arts (yes!) there are some basic principles/guidelines we should follow. Some of those I can think about include:

  1. Do not skip steps. When we want to computationally solve a real life problem there are basic steps we have to go through. These can be defined as:
    (a) Thinking how to formulate the problem. This includes the basic entities, the relations between them, the feature space, what we may be able to generalize from etc.
    (b) Deciding what kind of function we should optimize.
    (c) Deciding how to go about performing the optimization (i.e. the learning algorithm).
    (d) Thinking how we can evaluate success/accuracy and what would a “good” model give us.
    True, the above “steps” are highly connected. Still, way too often we jump to (b) and (c), the steps that are generally more technical and the focus of most course work. While we can not do without those I find that in many real life problems step (a) can be 90% of the fight. This is similar to jumping ahead to code something before you fully thought it through (a “sin” we have all committed, and likely more than once…). And just like in coding (or, for that matter, elbow escaping from a mount in BJJ) jumping ahead may work but is more likely to yield subpar results.
  2. Let the data guide you. Eyeballing the data to see where the main issues arise and to get clues for model preferences is priceless. But it’s more than that. Michelangelo is known to have claimed a stone has a statue inside it and the sculptor’s role is simply to discover it. Of course, if we take a rational scientific approach to this statement it makes little sense. But I see it as an insightful comment about the process: By taking this approach we are more likely to shed our pre assumptions, biases, and egos so as to see more clearly what is in front of us and what is required. So let the data (or stone) tell you its story. It’s the data’s story, not yours.
  3. Keep it simple (unless you have a good reason not to).¬†This is basically restating Occam’s razor. As ML researchers, we commonly want to build fancy models with all the bells and whistles we just learned about or thought about. But as researchers in Computational Biology who handle real life problems we should curb that urge. Besides practicality, there is ML theory behind keeping it simple (c.f. chapter 2 in the great Kearns & Vazirani [1]), and even beauty. And, again, simplicity and minimalism are common themes related to esthetics and beauty in the (Martial) Arts.
  4. Consider different lines of attack. Our first solution is not necessarily the optimal one. Especially if things do not work well try to open your mind, take a step back, and think what other approaches may work better.
  5. Iterate. This is also related to the above point. That book you love may look like a beautiful result of ingenuity and creativity, but is nonetheless the end product of many iterations, small insights and little victories. So will the model you develop.

One issue with the above principles is that they are rarely articulated to students. Worse, our current educational system is not geared to teach those (more on that in a future post). Their “artistic nature” also means they are easier to grasp and master through personal instructions and closely watching someone who “has come before us”. That, btw, is the literal translation in Japanese of the word “Sensei”. So if you are a student, go find yourself a good Sensei. Oops, sorry – I meant advisor ūüėČ

[1] An Introduction to Computational Learning Theory – Kearns & Vazirani, 1994.

The Art in Science – Part I

I was reading recently Neil Lawrence’s excellent post on how computer science degrees should be adapted given today’s challenges. Neil nicely points out that “Teaching programming alone is like teaching someone how to write without giving them something to say”, and discusses the need to understand diverse systems – unstructured documents, speech, vision, Bioinformatics etc. Then, one implicit point in Neil’s post caught my attention. Neil states:

Sitting at the core of each of these areas is machine learning: the art of processing and assimilating a range of unstructured data sources into a single model.

I found the choice of words quite interesting: A highly accomplished scientist lays claims about artistic elements in science. Is that really so? is there Art in Science??

Art involves “the expression or application of human creative skill and imagination”. It also relates to a notion of beauty and esthetics. Indeed, after spending some time in the field of ML you start seeing the beauty and creativity in elegant formulations for a specific real life problem as well as the distinct personal signatures of those formulating the solutions. Examples I recall include learning¬†about Shannon’s information theory for the first time, the generalization of EM by Radford¬†& Hinton [1], and the “magic” of boosting followed by its probabilistic interpretation¬†by Friedman Hastie and Tibshirani with subsequent discussions [2]. So, perhaps surprisingly similar to (yes!) Martial Arts, ML requires high technical skills but skills alone are not enough: you need to be creative in order to really push the boundary of what can be achieved and at a certain level you make the techniques your own, expressing your character.

This brings me to another important aspect of “the art in ML” which may have been alluded to in Neil’s post: What kind of models should you build? And more generally – what kind of questions you should be asking as a scientist? I’ll discuss this in my next post. In the meantime, anyone who has a nice personal example about where she/he found beauty and personal expression in ML papers is welcomed to leave it as a comment – it could make for an interesting reading list…

[1] A View of the EM Algorithm that Justifies Incremental, Sparse, and other Variants РRadford & Hinton 1998
[2] Additive logistic regression: A¬†statistical view of boosting –¬†Friedman Hastie and Tibshirani, 2000

Note Taking and Squirrelly Software

In the past year or so I have been growing discontent with my method of note taking. I started using Evernote during my postdoc years though admittedly with some reservation: putting all my notes in some new company’s propriety database that can only be accessed via dedicated software seemed problematic. Still, it helped me (and apparently many others) get better organized. And so my list of notes grew longer through the years, as did Evernote’s feature list. Recently though Evernote seems to have shifted its focus to business solutions: integrated group chat, collaborative note editing, etc. As a result, the software seems to have bloated, becoming more slow and more buggy. Moreover, many features you get for free from your OS, like offline note taking or searching in PDFs, are only available for premium costumers. So I started wondering: if I am interested simply in note taking, is Evernote the way to go? The last straw was when I was writing a post for this blog and Evernote managed to sync it out of existence, with no hope of recovery. Writing is painful enough without having my notes deleted, thank you very much.

So what went wrong with Evernote for me? besides the basic reservations about propriety database etc. it seems to have gone from a sharp tool for a specific task (note taking) to a dull one that does not excel at anything. Sounds familiar? Yes, it can be seen as another example of the squirrelly approach to Budo, interdisciplinary research or, in this case, software development. Not surprisingly, I was not the only one feeling discomfort. My grief with Evernote has been crystallized in Alex Payne’s excellent post where he calls Evernote and similar software an “Everything Bucket”. One of his rules for achieving computing bliss is to ‚Äúnot use software that does many things poorly‚ÄĚ i.e. “Squirrelly Software”.

What did I end up doing to solve my quandary about note taking? I followed Adam’s Pash recommendation for SimpleNote combined with nvALT. I get fast and reliable note synching, in a format that is also searchable directly on my local disc, with matching apps on all OS and mobile platforms.You can easily hook nvALT to your favorite text editor (Emacs with markdown extension? VI?) or directly start notes in the synced directory with your editor of choice. So far I am a happy camper. Good luck with your note taking, and beware of squirrelly software!

The squirrelly approach to Budo, interdisciplinary science, and software development

In his book Moving Towards Stillness [1], Dave Lowry discusses the squirrelly approach to Budo*, citing ancient writings by Hsun Tsu**:

The squirrel can do five things: He can climb a tree, swim, dig a hole, jump, and run. All these are within its capacity, yet he does none well.

The analogy is to people who try to train in many different Martial Arts but end up not excelling at any, with a superficial understanding of all.

I find that in interdisciplinary fields like Computational Biology we, and more worryingly our students, may end up like the squirrel. Admittedly, I find quite a few papers in the Bioinformatics field to be like that: Yet another method which is not particularly interesting computationally, accompanied with shallow understanding of the underlying biology. Such papers end up not really advancing our methods, tools, or our biological understanding. Many are well intended I’m sure, but the end result is not great. So what are we to do in our own scientific practice and when raising the next generation of scientist?

Dave’s advice is to concentrate on a single discipline in which you gain significant expertise and deep understanding. In Martial Arts, that can take a good ten years or so. However, Martial Arts tend to have many shared principles (more on those in later posts) and so by identifying and internalizing those one can later more easily learn from other Martial Arts, bringing more insights and depth to his/her original practice. Practicing hard and earnestly also teaches you *how* to learn, an ability that serves you well when you later expand to other disciplines.

The analogy in Science is to have a good foundation in some area, then add to it. If not, we run the risk of creating Bioinformaticians (including ourselves) that will have a hard time pushing the boundary of current knowledge.

Now, with all that said, to be perfectly honest squirrels do seem to excel at something (a point Hsun Tsu may have not realized or chose to ignore): They are very good at being squirrels. In fact, squirrels are one of the few mammalian families endemic to Eurasia, Africa, North America and South America, starting some 36 millions ago in North America [2]. So, while they may have not excelled at Hsun Tsu’s five tasks, they certainly have been around far longer than us, having their place in the grand scheme of things. And they probably don’t care much if some¬†philosophers think highly of them or not.

P.S: Wait, didn’t I promise a connection to software development as well?
Well, this post has grown long already, so this will have to wait for the next time.

*Bu – Martial, Do – way; a Japanese term referring to the Martial Arts.
**An influential Chinese Neo-Confucian philosopher from the third century BC

[1] Moving Toward Stillness: Lessons in Daily Life from the Martial Ways of Japan, Dave Lowry, Tuttle Publishing, 1999
[2] The effects of Cenozoic global change on squirrel phylogeny., J.M. Mercer & V.L. Roth, Science, 2003

What is this blog about?

About a year ago I was riding the train back home from work and bumped into my colleague and friend, Arjun Raj. Arjun writes a popular blog about everything science related that I highly recommend [1]. He described how liberating it was to write a blog and how he realized it actually reached people. I found this quite interesting and opposite to my experience: I struggle with writing (lets face it, this is not even my native language) and for as long as I can remember I did not like to say things in public unless I was absolutely sure I got it right. On the other hand, I found from conversations I had with students and colleagues that they found some of the observations I made quite useful. And so, I decided to step out of my comfort zone and start putting out there my musings about the world. Like so many things in Science, It just took me another year.

So why “Martial Arts Life Science and everything in between”?
Research, specifically in Life and Computer Science, is what I do and love doing. Martial Arts is another key component in my life experience. Besides the obvious physical/defense aspects, I see it as a way to learn about ourselves and the world, I like the philosophical aspects of it, and I like to find how I can bring insights from my Martial Arts practice to my everyday life to make me a better researcher, mentor, father, and a person. So here we go.