Quite often I have a dataframe in R with lots of columns with one of the columns being the year. I then want to pick another column and make averages for each year and plot it to show how things have changed over time. I know this is really easy to do, but I end…
Tag: rbloggers
Topic Models to explore and compare communities
Recently I’ve been playing with an R wrapper for a machine language library called Mallet to generate lists of topics from a series of text documents. The technique is called Topic Modelling and I have gotten to grips with it from Ben Marwick‘s readings of archaeology papers which has some excellent reusable code. A topic…
Data Hacking Fantasy Team
Around this time every year all my feeds are full of fantasy football teams. I wanted to join in; but I can’t be bothered with football. Instead I’ve created a fantasy team of data exploration tools. Looking back at my team these probably aren’t the best tools, they are just the tools that I really…
Scrapping Reddit comments
Something I had been meaning to do for a long time was write a quick script to scrape Reddit comments. A chap has beaten me to it and you can find the code here: https://github.com/ctaggart878/redditscraper. During lunch today I had a little play with it the script (and I mean quick!). A two line script imported…
Comparing word usage in documents using R
Previously I’ve mentioned how valuable Stack Overflow is as a resource when getting into a new such as R but yet again I’ve been blown away by just how generous people are with their time. Last week I was incredibly stuck using R. I had been stuck on the same problem for 2 days. You…
Exploring Telenovela with DBpedia, R and Gephi
Today I discovered Telenovelas. Telenovelas are short limited run programs similar to soap opera, they are popular in Spanish language counties and they are serious business. I stumbled across a clip on youtube and was instantly hooked. Check this out: I headed to Wikipedia to find out more only to find that Telenovelas is a…
Starting to Text Mine with R. Lets find out what’s happening in Coronation Street.. Stage 1
There is a great text mining package in R called ‘tm’. This is a short introduction to tm and how it can be used to create what is called a Document-Term Matrix, which is a matrix showing the frequency of terms over a collection of documents. While this is quite basic it’s hopefully going to…