Ever read a paper in a highly revered journal only to go “How the hell did this paper get in?”. There is only one conceivable explanation – the Force is really strong with these researchers:
There you have it. And may the force be with you.
In part three of what has become my mini series of muses about the “Art in Science” I wanted to get back to the more general question I mentioned in Part I: “What problem should I choose to work on?”.
First, a humble disclaimer is in order given my junior position. Take everything below with a grain of salt. Obviously YMMV and my perspective if that of a computational biologist growing up in a computer science/machine learning environment. With that said, here are a few observations I made during my years in and outside Academia:
In summary, the above points can be seen as general guidelines but to us scientists they offer no exact, deterministic, formula that we can apply to solve the problem. That’s why choosing what work to on can be seen as part of the “Art in Science”. When transitioning to tenure track, Tuuli writes that success is “a mixture of hard work, support, luck, strategy, persistence, talent and personality.” I liked her list. Indeed, anyone who successfully deals with “What to work on?” should be humble enough to admit that there is an element of luck involved. But you do not control luck so focus instead on what you do control. I really like Pasteur’s assertion: “Luck favors the prepared mind.” And Richard Hamming (I highly recommend reading ) added: “The particular thing you do is luck, but that you do something is not. The prepared mind sooner or later finds something important and does it”. Now, all we have to do is simply implement this… 😉
 From trainee to tenure-track: ten tips, Lappalainen Tuuli, Genome Biology 2015
 You and Your Research, Richard W. Hamming, Transcription of the Bell Communications Research Colloquium Seminar, 1986
In my previous post I explored some art aspects of scientific work that have to do with esthetics, creativity, and self expression. Another “art” aspect of ML is what kind of model/algorithm should we build for a given real life problem. There is no specific formula/recipe for that and like many things in life getting good at modeling takes time and practice, making it more of an “art” as referred to in Neil Lawrence’s post. Nonetheless, just as in Martial Arts (yes!) there are some basic principles/guidelines we should follow. Some of those I can think about include:
One issue with the above principles is that they are rarely articulated to students. Worse, our current educational system is not geared to teach those (more on that in a future post). Their “artistic nature” also means they are easier to grasp and master through personal instructions and closely watching someone who “has come before us”. That, btw, is the literal translation in Japanese of the word “Sensei”. So if you are a student, go find yourself a good Sensei. Oops, sorry – I meant advisor 😉
 An Introduction to Computational Learning Theory – Kearns & Vazirani, 1994.
I was reading recently Neil Lawrence’s excellent post on how computer science degrees should be adapted given today’s challenges. Neil nicely points out that “Teaching programming alone is like teaching someone how to write without giving them something to say”, and discusses the need to understand diverse systems – unstructured documents, speech, vision, Bioinformatics etc. Then, one implicit point in Neil’s post caught my attention. Neil states:
Sitting at the core of each of these areas is machine learning: the art of processing and assimilating a range of unstructured data sources into a single model.
I found the choice of words quite interesting: A highly accomplished scientist lays claims about artistic elements in science. Is that really so? is there Art in Science??
Art involves “the expression or application of human creative skill and imagination”. It also relates to a notion of beauty and esthetics. Indeed, after spending some time in the field of ML you start seeing the beauty and creativity in elegant formulations for a specific real life problem as well as the distinct personal signatures of those formulating the solutions. Examples I recall include learning about Shannon’s information theory for the first time, the generalization of EM by Radford & Hinton , and the “magic” of boosting followed by its probabilistic interpretation by Friedman Hastie and Tibshirani with subsequent discussions . So, perhaps surprisingly similar to (yes!) Martial Arts, ML requires high technical skills but skills alone are not enough: you need to be creative in order to really push the boundary of what can be achieved and at a certain level you make the techniques your own, expressing your character.
This brings me to another important aspect of “the art in ML” which may have been alluded to in Neil’s post: What kind of models should you build? And more generally – what kind of questions you should be asking as a scientist? I’ll discuss this in my next post. In the meantime, anyone who has a nice personal example about where she/he found beauty and personal expression in ML papers is welcomed to leave it as a comment – it could make for an interesting reading list…
 A View of the EM Algorithm that Justifies Incremental, Sparse, and other Variants – Radford & Hinton 1998
 Additive logistic regression: A statistical view of boosting – Friedman Hastie and Tibshirani, 2000
Snapped this during a recent trip in Ireland following our IRB-SIG. Almost an oxymoron, but not…..
In the past year or so I have been growing discontent with my method of note taking. I started using Evernote during my postdoc years though admittedly with some reservation: putting all my notes in some new company’s propriety database that can only be accessed via dedicated software seemed problematic. Still, it helped me (and apparently many others) get better organized. And so my list of notes grew longer through the years, as did Evernote’s feature list. Recently though Evernote seems to have shifted its focus to business solutions: integrated group chat, collaborative note editing, etc. As a result, the software seems to have bloated, becoming more slow and more buggy. Moreover, many features you get for free from your OS, like offline note taking or searching in PDFs, are only available for premium costumers. So I started wondering: if I am interested simply in note taking, is Evernote the way to go? The last straw was when I was writing a post for this blog and Evernote managed to sync it out of existence, with no hope of recovery. Writing is painful enough without having my notes deleted, thank you very much.
So what went wrong with Evernote for me? besides the basic reservations about propriety database etc. it seems to have gone from a sharp tool for a specific task (note taking) to a dull one that does not excel at anything. Sounds familiar? Yes, it can be seen as another example of the squirrelly approach to Budo, interdisciplinary research or, in this case, software development. Not surprisingly, I was not the only one feeling discomfort. My grief with Evernote has been crystallized in Alex Payne’s excellent post where he calls Evernote and similar software an “Everything Bucket”. One of his rules for achieving computing bliss is to “not use software that does many things poorly” i.e. “Squirrelly Software”.
What did I end up doing to solve my quandary about note taking? I followed Adam’s Pash recommendation for SimpleNote combined with nvALT. I get fast and reliable note synching, in a format that is also searchable directly on my local disc, with matching apps on all OS and mobile platforms.You can easily hook nvALT to your favorite text editor (Emacs with markdown extension? VI?) or directly start notes in the synced directory with your editor of choice. So far I am a happy camper. Good luck with your note taking, and beware of squirrelly software!
In his book Moving Towards Stillness , Dave Lowry discusses the squirrelly approach to Budo*, citing ancient writings by Hsun Tsu**:
The squirrel can do five things: He can climb a tree, swim, dig a hole, jump, and run. All these are within its capacity, yet he does none well.
The analogy is to people who try to train in many different Martial Arts but end up not excelling at any, with a superficial understanding of all.
I find that in interdisciplinary fields like Computational Biology we, and more worryingly our students, may end up like the squirrel. Admittedly, I find quite a few papers in the Bioinformatics field to be like that: Yet another method which is not particularly interesting computationally, accompanied with shallow understanding of the underlying biology. Such papers end up not really advancing our methods, tools, or our biological understanding. Many are well intended I’m sure, but the end result is not great. So what are we to do in our own scientific practice and when raising the next generation of scientist?
Dave’s advice is to concentrate on a single discipline in which you gain significant expertise and deep understanding. In Martial Arts, that can take a good ten years or so. However, Martial Arts tend to have many shared principles (more on those in later posts) and so by identifying and internalizing those one can later more easily learn from other Martial Arts, bringing more insights and depth to his/her original practice. Practicing hard and earnestly also teaches you *how* to learn, an ability that serves you well when you later expand to other disciplines.
The analogy in Science is to have a good foundation in some area, then add to it. If not, we run the risk of creating Bioinformaticians (including ourselves) that will have a hard time pushing the boundary of current knowledge.
Now, with all that said, to be perfectly honest squirrels do seem to excel at something (a point Hsun Tsu may have not realized or chose to ignore): They are very good at being squirrels. In fact, squirrels are one of the few mammalian families endemic to Eurasia, Africa, North America and South America, starting some 36 millions ago in North America . So, while they may have not excelled at Hsun Tsu’s five tasks, they certainly have been around far longer than us, having their place in the grand scheme of things. And they probably don’t care much if some philosophers think highly of them or not.
P.S: Wait, didn’t I promise a connection to software development as well?
Well, this post has grown long already, so this will have to wait for the next time.
*Bu – Martial, Do – way; a Japanese term referring to the Martial Arts.
**An influential Chinese Neo-Confucian philosopher from the third century BC
 Moving Toward Stillness: Lessons in Daily Life from the Martial Ways of Japan, Dave Lowry, Tuttle Publishing, 1999
 The effects of Cenozoic global change on squirrel phylogeny., J.M. Mercer & V.L. Roth, Science, 2003
About a year ago I was riding the train back home from work and bumped into my colleague and friend, Arjun Raj. Arjun writes a popular blog about everything science related that I highly recommend . He described how liberating it was to write a blog and how he realized it actually reached people. I found this quite interesting and opposite to my experience: I struggle with writing (lets face it, this is not even my native language) and for as long as I can remember I did not like to say things in public unless I was absolutely sure I got it right. On the other hand, I found from conversations I had with students and colleagues that they found some of the observations I made quite useful. And so, I decided to step out of my comfort zone and start putting out there my musings about the world. Like so many things in Science, It just took me another year.
So why “Martial Arts Life Science and everything in between”?
Research, specifically in Life and Computer Science, is what I do and love doing. Martial Arts is another key component in my life experience. Besides the obvious physical/defense aspects, I see it as a way to learn about ourselves and the world, I like the philosophical aspects of it, and I like to find how I can bring insights from my Martial Arts practice to my everyday life to make me a better researcher, mentor, father, and a person. So here we go.