Sunday, February 10, 2013
Big Data - ally or enemy?
There's a huge buzz around Big Data at the moment in circles of academia, technology and politics. People are excited by the prospects for analytical breakthroughs which may answer some challenging unresolved questions.
For a fairly simple explanation of what Big Data is (and you do need to know), pop along here.
A recent article on the BBC website gave an interesting bit of insight into the kind of fervour that's stirring up. The article came with the headline "Will Big Data herald a new ere in medicine?", the sort of typical headline that accompanies some of the more bold claims about what Big Data may, or may not, deliver.
For people interested in politics like you (how else did you get here?!?) and I, there are prospects about Big Data that intrigue. For example, we have long known that history will lead to the inexorable rise of the left - that is to say, of the workers (ahem - just for the sake of argument...) - so perhaps Big Data can shine a light on the behaviour of voters worldwide over the past fifty years, looking at age ranges, gender, then looking too at whether this has lead to more leftwing parties gaining power, or more leftwing policies being implemented.
Just shove in the 'right' metrics (nations, turnout, vote system, vote result, gender/age/ethnicity breakdown etc etc) and.... voila!!! out pops the answer... Except it doesn't work like that.
As one of the big daddys of statistics, Naseem El Taleb, author of Black Swan (no, not THAT Black Swan, THIS Black Swan - one of a series of best selling statistical books) has recently written in Wired, Big Data is fundamentally limited.
The reason for the Big Data limitation is fairly straightforward.
Answers to complex, challenging questions, are hard to find. Having more powerful analysis, i.e. more educated, nuanced, well-developed approaches to examining issues can provide answers. Having more information - i.e. Big Data - simply means that the haystack which contains the needle is much bigger.
In the field of politics, this has interesting implications. Taleb proposes that the availability of Big Data means that it is easier to manipulate/select that wealth of information to prove a hypothesis that has been developed; i.e. forgone conclusions to academic's - or other's - areas of research.
At the moment, in the field of politics there is one area of huge debate - is austerity worsening the depression? or is it fixing the mess? This is classic Keynes V Friedman territory (though I must stress - capital and infrastructure are the key spending initiatives Keynes would favour, not just spending more money on everything, such as welfare, for instance).
And this huge area of debate is being hotly contested using the tried tools of the trade - selective evidence to support hypothesis. Take a trundle over to Telegraph blogs, and you'll see what I mean.
Of late, for instance, many of the bloggers on the right of the spectrum are enjoying the recovery of Latvia, because it went on something of an austerity drive and has recovered strongly in recent quarters. This is amazing 'selectivitis'. One small nation, which few know much about, is having a recovery for reasons even fewer known about. It is very easy to use this tiny sample out of the vast amount of data on other nations, to justify austerity, and that is precisely what is happening.
What we are beginning to see if how easy it is for those on the wrong side of the argument to defend and obfuscate themselves with a minute amount of data on their side. The Latvian example is debatable, but even if it wasn't, is it's sample comparable, or sizable enough, to warrant drawing conclusions about the wider use of macroeconomics elsewhere, in larger regions such as the whole of Europe? I am far from convinced.
So for me, Taleb's warning about the misuse of information to demonstrate hypothesis is timely, and we should watch for these hypothesis-proving examples on both sides of the spectrum, and argue for honest cynicism about the usability of small samples in either case.
Big Data is going to be a huge area of focus over the coming years, especially with the growth of the internet of things, but we should remain mindful of the possibilities, as well as being cautious and healthily skeptical of the users of such data, and their methods of reasoning.