Visual breakdown of categories on wordpress blogs using R

This is a very simple recipe, just a few lines to get an indication of tags/categories being used on a WordPress site. The idea is that I use R to read the RSS feed of the blog, pick out the tags and categories and display a pie chart of the tags being used. Since tags and categories in wordpress are set by the user it gives you an indication of what subjects the authors think their posts fit in to. It might be a useful script in keyword planning and seo, but I just use it to see how my interests are changing.

1.Pick RSS Feed

I used the RSS of my personal website, http://davidsherlock.co.uk/feed”.- It is worth noting that by default WordPress RSS displays the last 10. If I wanted more I could use yourwebsite.com/feed/?paged=2 to give me the next 10, I could write a loop to the categories for as many posts as I wanted, but I like the idea of being able to compare tags at different times of my blogging life.

2.Read RSS feed

We need  the XML package, so we need to import this packge, read the RSS feed and find all the tags. First we import the library package, parse the XML feed in to an R structure and search using xpath for categorys:

library("XML")
doc<-xmlTreeParse("http://davidsherlock.co.uk/feed")
src<-xpathApply(xmlRoot(doc), "//category")

3. Pull out tags and chart:

I then loop through all the categories, put them in a dataframe and create the pie chart

for (i in 1:length(src)) {
  tags<- rbind(tags,data.frame(tag=tag<-xmlSApply(src[[i]], xmlValue)) )
}

pie(table(tags$tag))

There is quite possibly easier ways to do this, but it’s only a few lines and works. Here is the produced  pie chart form this script:

pie_chart_of_categories

Perhaps I’m getting fed up of my blog and want to be more like Perez Hilton, just change the feed to ‘http://perezhilton.com/feed’ and compare. Turns out I need more gifs.

perez hilton topics

data_analysis_paddytherabbit

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.