Around this time every year all my feeds are full of fantasy football teams. I wanted to join in; but I can’t be bothered with football. Instead I’ve created a fantasy team of data exploration tools. Looking back at my team these probably aren’t the best tools, they are just the tools that I really enjoy using because I know them well or because they are easy. Don’t think of this as Manchester United, but as the pub team that always loses but has fun.

So here we go:


Leet diagram skills, eat it Match of the Day

Leet diagram skills, eat it Match of the Day


A quick run through of my team.

Goal Keeper: RAM Upgrade. The Goalie is really the backbone of my team. When I’m messing with a dataset I seem to have loads of applications open all crunching stuff. RAM is cheap and by upgrading to 16 RAM to my team has stopped them getting slow and getting tired at half time.


My defence is a bunch of places where I go when I’m stuck or need inspiration. While twitter is a good place to shamelessly plug your stuff everybody else is doing the same thing, and unless you actually know the person I end up wasting my time. I gave twitter away on free transfer.

Left Back: Stack Overflow. I really don’t know what I’d do without stack overflow when I’m stuck, ask a question and the chances are somebody will help you. Better than that is the fact that somebody has most likely already asked the question and you can borrow the answers from them

Center Back: Reddit. There are tons of Sub reddits to give you inspiration. I subscribe to lots and its best to find what suits you. Still, my favourites are r/dataisbeautiful, r/datamining, r/rstats, r/rstudio.

Center Back: Github. Not only is it a good place to bung your code, Github gists are so handy its unbeliable. I recommend Ben Marwicks and Adam Coopers Github if you are interested in doing things with text mining.

Right Back: Cafe Bar. Actually talking to somebody outside the office is a great way to come up with ideas. Seriously, don’t just Skype them, try it.


I have a bunch of tools in midfield that I am comfortable with. When I say comfortable I mean I can mess around with them and pretty pictures come out. I don’t really understand the algorithm somebody wrote for their PhD that sits behind the button that produces the nice picture. It’s like a midfield full of Luis Suarez;  they get results but I’ll be damned if I know what’s going on in their heads.

Left Wing: MAMP. MySQL, PHP and Apache still seem to come in useful for loads of things. MAMP is a MAC application that installs them all but keeps them self contained.

Center midfield: Gephi. Geph’s ease of use has made it pretty much dominated graph visualisation. I’m pretty sure my hamster uses it. The hardest bit is getting the data so that Gephi will import it, which is why I have paired it up with….

Center midfied: Google Refine. Takes the messy data, cleans it up. Passes it to Gephi.

Right Wing: Mallet. A new purchase for me this season. Mallet is a bunch of tools for data mining. It lets me do Topic Modelling with the press of a button. The problem is that it does it so efficiently and so easily that I’m not sure 100% of the time what it’s done. I just think it’s clever. He’s the young Brazilian forward who is a whizz with the ball but doesn’t speak a word of English.


My forwards are brothers, they are the swiss army knife of messing with data but if one is injured I’m in serious trouble.

Forward: R. Not really sure I have much to say about R. He can do anything and picks up new techniques fast. He’s just difficult to communicate with and manage. Which is why I need his brother:

Forward: RStudio. R’s Brother organises him and make sure he turns up to the match on time. If I have some new tactics I want to try out, I tell them to Rstudio who will is the only person I know who can control his brother easily.

So anybody have any changes I should make for next match?


Leave a Reply

Avatar placeholder

Your email address will not be published. Required fields are marked *