Brazilian Jiu Jitsu (BJJ) is a grappling martial art focused on submitting your opponent by chokes and joint locks (arm bar, knee bar etc.) Deep Neural Networks (DNN) is a class of models in machine learning which you train on data for specific tasks (e.g. object recognition in images). Is there any connection between the two??
In today’s post I want to get back to one of my favorite themes, drawing connections between Martial Arts and research or everyday life. I usually identify some principal/idea in my Martial Arts, then find a parallel in everyday life/research (see for example here). But this time it started with our current efforts to train deep neural networks. Taming the DNN beasts is somewhat of an art, but that’s not the point I want to make today. Instead, I want to focus on what we optimize, how we optimize, and for what. Sure, you can be technical at this point and talk about the details of your SGD, the learning rate, momentum, dropout, Adam and whatnot. But I want to look at it more, well, philosophically: What you really want to achieve is *generalization*, “mastery of the domain” if you like. And how do you go about it? You try to optimize some sort of temporal cost function. I call it temporal because in typical DNN training the sample set is huge and your SGD is based on your model’s current experience (the sample/mini batch) and its current state (model parameters), with the hope that when the model sees something new in the future (test sample) it will be able to handle it well. The reaction to the current sample is based on the function you set to optimize, i.e the cost function. While in some cases the cost function naturally arise, it is generally something we *make up*. Sometimes it’s a reasonable approximation of what we are truly after (think of #mistakes vs. say cross entropy or hinge loss), sometimes it’s rather crude (e.g. maximum likelihood of a simplistic probabilistic model for a biological process). Another point is that in typical DNN training we don’t even try to achieve the magical “global optimum”, settling for a “good enough” reduction in test error. And if you try to “rush” your learning going too fast for your temporal loss function (e.g learning rate) you do not do as well.
And what happens when you train BJJ? your temporal/local optimization function is winning/losing when you train fighting with someone. But that’s not your *real* goal. Just like your DNN you want to achieve mastery of your domain (BJJ). And you may have other goals as well: Good health/shape, self-defense, fun, etc. But if you focus too much on the local function (winning) you are going to miss out. How? For one, if all you care about is winning/losing you will not push yourself into hard situations (which may cost you the fight), limiting your exploration and therefore slowing yourself on your way to master the vast space of positions/states within the art. If you think of every time you get submitted/choked as merely a (negative) loss you will likely lose not just good lessons, but also much of the fun. But too much focus on winning/losing can have more subtle effects. Instead of emphasizing good technique and accepting a loss when you fail to execute you will insist in muscling yourself out of bad situations. This eventually leads to injuries which again slow you down, or completely stop you. And the funny thing is it does not need to involve a heroic move/submission or an opponent that goes crazy and breaks your arm. It can be as simple as you exploding out of a bad position for a split of a second, when your body is already tired after say two hours of training with a bunch of opponents that are all better than you. So you (partially) tear your knee’s MCL, have to recover for 3 month and have to go to your conference (ISMB 2015, Dublin) on crutches. Now you could argue that’s a good way to build name recognition (“I remember you, you are the guy with the crutches from last year” – I got this at ISMB 2016 in Orlando this year…) but I would strongly advise against it*.
In summary, it turns out that good practices in training DNN translates to good practices in training BJJ, which can in turn have great impact on your mood, happiness, and health…. Who knew? So just keep in mind in your own DNN/BJJ/whatever training to not let your local optimization function make you lose sight of your true goals….. and good luck!
*When on crutches at a conference, try to say make it on time for a talk, or squeezing in a row for an empty sit when you finally arrive late. After the talk, you can stand in the line for coffee, then get “here is your coffee sir”, then realize you can’t do anything with it and you have to relay on the good help of others. The good news is that it can be temporary and can hopefully go away. Having crutches is a humbling experience, but that’s a topic for another post….