Big Data Problems

Writing about an on going conversation that you are having with someone when you haven’t quite got your head around yet is a difficult task, it does help you to think the problems through but it leaves you with a mess of a blog post.

Mark and I have been having discussions around big data analytics; in particular we have been talking about a post that Mark has written a blog post worrying about the outputs from many Big Data projects, such as the pretty node/edge pictures and tables of key phrases that frequent my Twitter stream. Being a student of Mark’s for some time and now having him as a PhD supervisor, I think I know Mark quite well and when he asks you to read blog posts of his it is usually an intervention to make you think. The core of the blog post Mark likens the creation of these outputs to the magician pulling a rabbit from big hat, it looks cool and we all gasp but he says it’s magic, not science, produced by algorithms that a handful understand. His worry in his own words are that outputs are produced by “vast calculations that lead to the dulling of critical thought” and that ‘Science is underpinned by ethics and politics; magic isn’t’.

Mark has nudged me to write a response to this post, and at this moment while I sit starting at my screen I find it difficult to know how to start my reply. Not because I disagree with Mark, but because I find myself with more questions – which I suspect was his aim as my PhD supervisor.

When trying to get the idea of big data over to somebody new to the term both myself and Mark often try and find a practical example of successful big data analytics in the wild. The example we often cite is the mining of opinions during the 2012 U.S by the electoral teams and how this mining affected the use of social networks by the candidates. This is an interesting example to cite because earlier in the year, the technical adviser to the democratic electoral team, Harper Reid, rallied critics of big data with his famous “Big data is bullshit” talk, he was saying that ‘Big’ is just a buzzword to make you buy into storage technologies, that data analytics is just analytics and that if ‘big data’ is in fact a thing we should forget it as a buzzword and be concentrating on ‘big answers’. Harper tells us that the sort of analytics is the sort of thing we do in excel or with a database query and that the ‘Big’ is all business bullshit hype.

I take this as a suggestion that the jump from just plain data analytics to big data analytics is simply about the size of the dataset, that the techniques are the same and that we should all just be doing stuff to data, thinking about it and not being put off by everything having to be so damn big. While Harper may be right that the techniques are the same, I don’t think the situation is the same, what people are calling Big Data Analytics are the things that lead to Big Answers, these big answers lead to Big Innovations which end up at Big Fuck Ups. When I think of big data there are a few big interventions that spring to mind, these are Tempora and PRISM.

I think Mark and Harper are right about different things. Harper is probably right that being Big makes us think we are doing something extra special clever at the Macro level when the techniques are just the same but scaled up. Mark is probably right when he say many of techniques are akin to pulling Rabbits out of a hat.

Thinking about my own experiences of playing with data and running some analytical technique over them, I don’t think of myself as a scientist or a magician, I’m just playing. I’m don’t consider myself a journalist when putting data in to the social networks and I don’t consider myself a scientist when I extract it and run an analysis over it. In fact I don’t really initially expect anything much more than a laugh. Something recognisable does come out of the analysis though, and I kick this output around the same networks we are analysing, at this micro level of analysis I always feel there a message saying ‘this is play’. At this level I find it a little unfair to tell people playing with analytical techniques and making graphs that they should be a scientist, play is important!

Even at this low level while playing with my own ‘little’ data, there are hands trying to influence how I play. The data collection services have an interest in what I put in to their services and the algorithm writers are defining what comes out. They have an interest in how I play, which makes the analysis of the data very hard, but this is exactly why playing is important, to me play is part of a process of working out who is trying to influence what. This isn’t an easy thing to do as both the services and algorithms are well aware that of my playing and want me to be the hero in the story and will display the data in the way it knows I want to see it. It feels kind of relaxing when Twitter shows me tweets relevant to my life, or the network graph makes me the biggest node.

So when I see the message that these playful techniques can easily be scaled up to big data analytics and eventually big answers I do worry. The resulting big innovations have real effects on lives and unlike the scenario where I play with spread sheets to explore who is manipulating me using what data I have no control over what data the NSA is collecting or what algorithm they are using.

I don’t just see magicians and scientists, I see people playing too and while play is all fun and games there are hands influencing just how we play and perhaps it isn’t so playful scaled up to big intervention level. The whole scenario gives me somewhat of a split personality. On the one hand play may be the way to find out how these data services and algorithms are guiding the information that goes in  how it is analysed, yet on the other hand those exact same data services and algorithms are guiding how we play so that they get the results they want. When scaled up these skewed results and their resulting big interventions have major effects society. The transition to ‘Big’ anything is where we must be scientists.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.