What Can Ancient Japanese Warrior Philosophy Teach Us About Data Science

In Data Science by Alon K.

Sometime toward the end of the 16th century, a brilliant swordsman entered the Japanese world, Shinmen Takezō – or more popularly known as Miyamoto Musashi. 16th century Japan was marked by civil war and unrest between the various lords and clans of the time until the Tokugawa family dominated the region and unified the various lords under their single rule at the beginning of the 17th century.

That Musashi was a sword prodigy can easily be seen by a single statistic – where it is agreed upon that he was undefeated in 61 duels (to the death) against other sword-masters, and where the next best such number was 33. This feat was so extravagant that tall tails about his inhumane speed and skill were spun to commemorate him. Beyond the battle, Musashi was also a philosopher, writer and master tactician. A ronin by trait – a samurai that didn’t serve any particular lord – Musashi’s core philosophy was the path of self-reliance. In modern day, we might refer to him as a battle consultant.

So what can Musashi offer us in the “Way” of Data Science. Firstly, let’s look at the world of Data Science and the striking similarity to the period in which Musashi grew up. Today, we see a huge demand for “Data Scientists” because of a huge increase in computing ability, availability of data, and a revolution of methodologies. There are many big (and small) corporations fighting to gain an edge, and even control, in the domain. And so Data Science is a hot and romantic career choice for many aspiring science and engineering students, but also for more tempered individuals wanting to make a career change and take a grab at the easy money thrown around at any mention of the keywords AI, ML, Deep Learning, and so forth. This is 16th century Japan but in the world of Data Science – a lot of hype, a new methodology every week that is “novel” and “inspired” (or more likely just bull) and utter chaos, confusion and chicanery.

Give it a couple of years, and like the samurais of Ancient Japan, so too will this nonsense go extinct. But how can we survive even after 61 duels? Well, unlike the tall tails that were spun about Musashi that would seem more appropriate for a Japanese anime, the man had a philosophy that kept him alive and it was far from a reliance on inhumane speed and skill.

Musashi divided the pillars of his philosophy into five areas that, when followed, lead to an eventual supreme mastery. Musashi actually applied a core technique of Zen Buddhism to describe how one can achieve mastery as a practitioner of battle. But he explained it in both paratactical and abstract ways so that we, today, can apply this technique to any field we wish to master. These pillars are titled Earth, Water, Fire, Wind, and Emptiness and available as the most recent English translation by William Scott Wilson in The Book of Five Rings. In fact, Musashi’s philosophy has become a modern day classic guide to business and management, or simply as a guide to overcoming challenges.

In this article, I will explore each of the five rings, and how I see them relate to the life-path of a practitioner of data science.

In the first chapter, Earth, Musashi lays the foundation for mastery of swordsmanship. An attention to detail and meticulous practice with the implements of battle – the weapons and armor – are some of the major themes of this chapter. Musashi, in one part, tells us how ridiculous it is that many samurai wear two swords at their waste (the long and short swords), simply because it is customary to wear two swords, and without any intention of using both. Truly, how wasteful is it that a swordsman should die because they had to lug around a tool that they never intended to or knew how to use. That gets right to the heart of the Earth book.

As data scientists, we have at our disposal many tools, more than we could ever want to use, and even more than we should try to use. Let us, for example, take Time2Vec – a technique for modeling time-series data (and is in no way special so as to receive a spotlight in this article, but merely one of the more recent examples that I want to draw upon). Whether the technique covered in this paper is novel is not for me to judge (I’ll leave the judging to this place). But put simply, the technique involves using a sine basis function rather than a sigmoid (or nothing at all for that matter) to embed a time-domain signal into a latent space (the frequency domain in this case) before pushing it into an LSTM network. Now, the authors do compare Time2Vec to the Fourier transform in their introduction and how they offer Time2Vec as simply an alternative that can learn specific frequencies as opposed to using fixed ones – that statement alone already tells me that the authors are a bit out of their element. But if they already brought up the Fourier transform, then I should expect to see some benchmark test involving an FFT (fast Fourier transform) feeding into an LSTM structure. But I didn’t find anything of this sort! Instead, they simply showed that when your data has cyclical behavior, then using a sine activation function will model the data better. In all, I don’t know what was the whole point of the paper, but had they done an LSTM+FFT test then they would have discovered that such an approach gives better results – I tested it myself because it seems that among 10 authors, not one could be bothered with the 2 hours of work that it took to do this test (this tensorflow tutorial does a great job of teaching you how to work with time-series data to get you started).

Therefore, and in the words of Musashi, let us think deeply on this. We have many tools at our disposal and we should know and understand them before we try to complicate matters and get too fancy. As such, data scientists should, firstly, have a strong grasp of the necessary mathematical domains needed for their chosen field. If you want to study time embeddings, then gain a deep knowledge of Fourier, Laplace and Z transforms; maybe look at Markov Chains and Hidden State models before tackling something like an LSTM. Fully understand why and how. Then you can show us how these different tools can be applied innovatively as an embedding to an LSTM, or you can innovate a new technique that is better suited for the particular application. And secondly, data scientists should know how to put these thoughts into practice or experiment using programming. My own experience suggests that the two most useful programming languages to a data scientist are C++ and Python, but each person should come to that conclusion in their own way – at least that is what Musashi would recommend.

Finally, we should also understand what it is that data scientists should strive to do. Should they focus on making Hot-Dog Not-Hot-Dog apps that identify whether there is a hot-dog in a picture? If any task shows true mastery, it is the ability to design a machine that integrates itself as a useful tool or device in society. Let us use self-driving cars as a template. It is a complex task that not only involves many sub-tasks, but carries with it a heavy price for failure – the life or death kind.

Next we come to Water. As once said by Bruce Lee, “Be water, my friend!” This quote finds its origin in a Buddhist concept that you should adapt yourself to any situation to achieve a state of equilibrium – like water settling to the shape of its container. Here, Musashi focuses on the method of training, where focus should never be to favor any particular stance, positioning or mindset. Rather, the swordsman should be comfortable in all positions.

From a strategic point of view, this type of training increases the odds of success when facing an opponent. If you stick to a handful of strategies because that is your comfort zone, then it becomes quite easy for your opponent to predict your next move and thereby put you into an uncomfortable position and gain the advantage. Likewise, if you make yourself unpredictable by fluidly moving between stances and techniques, then you will increase your chance of eventually gaining the advantage over your opponent.

As for data scientists, most of the Water chapter involves the whiteboard. Consider designing the self-driving car. There must be an endless number of ways of attacking this task and all associated sub-tasks, but the more time you spend in the planning phase, the less likely you are to corner yourself into a design that doesn’t work and will require a complete overhaul of the entire project.

Suppose, for example, that one team works on building the car’s sensors, another team works on converting the sensor inputs into features, and a third team works on converting these features into actions. All the teams meet and decide on the best sensor to use, and given this sensor, the best model to extract the features requested by the actions team, and then the best execution procedure given the features. Next, suppose that the sensor team is having difficulty providing the necessary resolution needed by the features team, or that the features team can’t extract the needed data, or the actions team finds out that they need additional features to achieve a threshold certainty with particular actions. Actually, suppose that a combination of the above happens, then what happens next? Do you keep on trying to squeeze the most out of a no-win scenario simply because that is what you have? Do you trash the whole system and start from scratch? The correct answer is neither.

Instead, the correct approach starts at the design and builds into it some risk management. For example, at the sensor design level, most problems arise from a need to compress raw sensor data so as to make it easier to transfer and also process at subsequent steps – for example: a 720p video takes less time to process than a 1080p video; an H.264 compression is slower but of higher quality than MPEG, and both are slower but of lesser quality than a raw feed. All possibilities should be considered, including building a custom compression scheme that is optimal for the task at hand. What about the feature step? Suppose there is quality degradation, then should you keep changing the model for each level of quality? Maybe you can design a data preprocessing model that removes anomalies, and even corrects them before extracting the relevant features. Finally, the action model should consider the possibility of various failures anywhere along the data delivery pipeline and the action design should behave accordingly – because, unlike training a car to run around a track in a simulation, failures can happen in real-time in a production environment.

It is not expected that this knowledge come from any particular book or lesson. Water is what separates the academics from the (successful) practitioners. It is gained from years of experience and self-reflection on your own work. It means that you gain expertise in a particular domain, but you also keep your mind open and inclusive of new ideas and methodologies (and this does not contradict being critical of methodologies that don’t give us any advantage). Therefore it is a fitting second step toward data science mastery.

You may already guess what the Fire book is all about if you already detected the pattern of giving each chapter an allegorical characteristic. Simply put, it is action in the moment. But in the moment goes far beyond the actual moment and includes apt preparation, as well.

Musashi begins this chapter by reminding us that in battle we do not have time to consider insignificant things like which weapon to use, how far to extend the wrist for a strike and at what distance to stay from the opponent. These are all matters that should have been drilled down beforehand. Instead, we need to focus on establishing the rhythm of the battle by assessing things like our surroundings and our opponent, and executing a plan of action with perfect timing and conviction.

Going deeper into Musashi’s battle tactics is not necessary to see that what we, as data scientists, call fire is the real-time management of our machine (the self-driving car). At this point, we have an Alpha or Beta model ready to be tested on the real road. What we need now is the ability to assess performance in real-time.

The way we do this in the world of programming is by creating a log file and assessing particular parts of our program. This can include inputs/outputs of any particular model, summarized data, well placed try/catch statements, direct camera feeds or LIDAR visualizations. Generally, any real-time data collection and visualization tools that will alert you to special circumstances is what you want. This is how we prepare to act in the moment.

Similarly, at this point we should not expect to make serious changes to the system. This is something that should have been done previously through rigorous back-testing (the Water chapter). At this point we find bugs and add small patches as need to ensure that our current version works as promised. At this point we need to survive the battle! Once the delivery cycle is over – and the client will either be satisfied or not with the current version – then we reflect and reassess where we can improve, which can possibly include redesigning the whole system as needed (again, this is Water because we aren’t in the moment).

At this point, something may have occurred to you. In our field of data science, if we fail at a delivery cycle then we don’t die like a warrior that might die in a duel – almost, if you keep failing then you should take this as a hint that something is not right. So what kept Musashi undefeated? The secret ingredient finally comes to us in the Wind chapter.

In Wind, Musashi goes into studying the various philosophies and techniques of other warrior schools for two purposes. Firstly, he wants to show the superiority of his own school. But moreover, he wants to convey that in studying another school’s technique, he finds deeper meaning into how he should find, and perhaps, even correct the shortcomings of such techniques in order to improve his own. One of my favorite quotes of Musashi comes from this chapter: “… the other schools get along with [their Way of battle] as a performance art, as a method of making a living, as a colorful decoration, or as a means of forcing flowers to bloom. Yet, can it be the true Way [of battle] if it has been made into a saleable item?”

Recall my critique of Time2Vec? The method of critique portrays this point. Like wind that can penetrate deeply into every hole of structure, we need to look at the core of what it is that we are being sold. For example, I was curious as to why the authors didn’t bother checking FFT+LSTM, and then I tried it myself. I found that because the FFT embeds the time domain into fixed points, then our frequency data is hidden in the “fixed” frequencies and we leave it to the LSTM to sort it all out. However, if we leave the embedding to additional variables from Time2Vec, then when we have to train Time2Vec+LSTM and we run the risk of dealing with a singular model (a fancy way of saying an over-parameterized model). So my conclusion was that FFT+LSTM is superior because it is easier to train, less-likely to be singular, and as a result, provides better backtest results.

How can we apply this to our own path? Perhaps I can summarize it with the following thought. Building a Hot-Dog Not-Hot-Dog app (yes, Silicon Valley left a lasting impression on me) can be a fun and possibly educational performance art. However, we should understand that its purpose is merely that – art. It is the type of work that is, in Musashi’s own words, derivative and not deep or novel along with perhaps 90% of the work you can find on arXiv.org, towardsdatascience.com, or any other such non-peer-reviewed catalogues. Meaning that all these sources, while useful in sharing knowledge, per se, will not help you gain a deeper understanding without you putting in some work to validate them, check them, and take them further yourself.

So how did Musashi remain undefeated? Well, he made sure to stay one step ahead of the competition. He set the right expectations of himself and his opponent. And that is exactly how we should behave as data scientists. We need to know what technology is available in the market, how to use it, and how to set proper expectations with our customers on how to improve it. By far, this is the one crucial difference that separates a win and a loss – the frame of reference, the benchmark. So do not set yourself up to fail from the start.

And finally, we reach the Emptiness. This is, perhaps, one of the most difficult concepts to convey and probably why it has been devoted the least amount of words in the Five Rings. It is the abstract concept of mind controlling matter. If you are familiar with the practice of mindfullness meditation, then making the association between the two will bring you halfway there.

This Emptiness is a type of spirituality, akin to Bob Marley’s lyric “and now we see the light.” When the mental click happens, then the path becomes clear and you can act with conviction, knowing that you have mastered that which you are undertaking. Well, that is almost it.

At this point I would recommend that besides reading The Book of Five Rings, you should probably start with The Unfettered Mind by Takuan Sōhō as translated by William Scott Wilson. Takuan devotes a whole three chapters to what I have tried to convey in a paragraph, and Musashi tried to convey in a single page.

And so, good luck on your Way of the Data Scientist!