Problems with quick Wikipedia heatmaps of birth locations

This doesn’t tell you how many people of that profession there are, it tells you how many people of that profession are in Wikipedia, with structured data on both the persons birthplace and profession. It might tell us as much about how Wikipedia is being used as how many people from that profession are actually…

Testing creation of intensity maps from Wikipedia

I wanted to come up with a quick method that I could reuse where I could ask Wikipedia a quick question, get some numbers and create a heat map. I still haven’t come up with the quick process, and when I mess with data I tend to stick with what I know for a ‘first…

New LAK dataset

I’ve been informed by Davide Taibi that the LAK dataset has been updated, this update includes some paper text that I reported on, but also has lots more data. As described by Davide: This version includes papers from: – EDM conferences (2008-2014) – LAK conferences (2011-2014, 2014 only abstract since we are waiting for ACM…

Scraping a HTML table into an R dataframe

For some work I plan on doing for the LACE tech focus blog I wanted to get some information off a webpage and in to an R dataframe. It turns out this is a three line solution (1 line if you throw your URL straight to the function paramaters and have the XML package already…

ggplot notes: Line Graph

Quite often I have a dataframe in R with lots of columns with one of the columns being the year. I then want to pick another column and make averages for each year and plot it to show how things have changed over time. I know this is really easy to do, but I end…

Topic Models to explore and compare communities

Recently I’ve been playing with an R wrapper for a machine language library called Mallet  to generate lists of topics from a series of text documents. The technique is called Topic Modelling and I have gotten to grips with it from Ben Marwick‘s readings of archaeology papers which has some excellent reusable code.  A topic…

Scrapping Reddit comments

Something I had been meaning to do for a long time was write a quick script to scrape Reddit comments. A chap has beaten me to it and you can find the code here: https://github.com/ctaggart878/redditscraper. During lunch today I had a little play with it the script (and I mean quick!). A two line script imported…