banner ad

More data usually beats better algorithms [Updated 2022]

Every so often I read something which subtly changes my perspective in a  fundamental way. Anand Rajaraman's post More data usually beats better algorithms is one such piece.

It contains two simple ideas:

  1. Firstly, the main thesis is that adding new data to an analysis often beats coming up with a more clever algorithm. He suggests, for example, that by including which pages link to which other pages as additional data, Google was able to beat previous web search algorithms which focused exclusively on the words on the pages themselves.

  2. Secondly, the related 'Rule of Representation' which says that you should 'Fold knowledge into data, so program logic can be stupid and robust.' This second idea is actually presented by someone writing a comment to the original blog entry. But it is still quite pertinent. This particular rule comes from the computer programming field. But it has more general application.

I read this as I was starting to formulate my conclusions on a review I was doing of a client's strategy. Perhaps it struck a chord because the crux of my review had been to use some additional data about the market which had not been considered by the original authors of the strategy I was reviewing. This led me to some different insights which had not yet been considered. But I had also been unable, within the confines of the assignment, to find some additional data which I feel would have further significantly enhanced the review.

Of course, it matters greatly what data you add to the analysis.

More data of the same kind probably won't make that much difference. For example, measuring the output of a fairly standardised process over 1,000 iterations might yield some interesting results. Adding the same metrics over another 1,000 iterations probably won't tell you much more.

To add value, the additional data must provide a different perspective and/or be from a different source than the data already included. This suggests that a key skill for the strategist is having a keen sense of what data is available and where to source it. Not simply relying on the data that is immediately to hand.


  1. the data to hand relates to the organisation's own operations and performance, but
  2. the data that is more crucially missing relates to the competitive environment.

The LSE endorses this view in their post More data or better data? Using statistical decision theory to guide data collection | LSE Business Review.

But, somewhere towards the end of the strategic thinking process, you have to look beyond the data and apply a little creative intuition. That is driven by your experience. The sum of data you've collected along the way but which you cannot tabulate in a spreadsheet. In the final analysis, it is that data that may be the most important addition. It is that data that creates the distinction between also-ran me-too strategies and analysis, and truly differentiated and competitive strategies and analysis.

No comments: