Press "Enter" to skip to content

Category: data visualisation

Finding text matches in Reddit comments

I’m currently trying to write scripts for Google Apps that poke around with Reddit API and save results in Google Spreadsheets. I’m finding hard to learn both the Reddit API and Google Scripts at the same time (despite much appreciated help from @mhawksey and @brucemcpherson) .  In an attempt to keep my sanity I am routinely returning to Python/Raspberry pi and other things I am more familiar with. Eventually one thing I would like my Google scripts to do is mine comments to gauge what the conversation is about is about. As a test I wanted to write a quick script that could take a text file full of words or phrases and could then check how many times each of these words was used. So, for sanity reasons I am first writing this in Python to be ran on my Raspberry Pi, later I would like to create a…

Visualising Blog comments

Yesterday I had a chat with old Cetis friend Sheila, we got on to the conversation of visualising text, something I am really interested in but also really struggle with. Sheila had been playing with Textexture  a tool to visualise any text as a network, I was intrigued by Sheila’s post and decided to do some experiments of my own trying different ways to visualise blog comments. Here are some of my own quick experiments and thoughts on them while I was playing with Textureture and trying to create something similar in R/Gephi. Just a note that I’m not sure how useful any of these methods are yet. Its very much a ‘try it and see’ at this stage. Grabbing the data I’m using my own blog comments, I grabbed a CSV through a quick SQL query. You should have changed your prefix during installation, but the query might look…

Creating Twitter Graphs of Friends using twecoll and Gephi

Exploring your social media networks with Graphing tools such as Gephi is quite easy, the hardest part is working out what you want to explore and how you will get the data. As a starting point I’ve collected a few simple recipes for network graphs. I’ve done all these on a ‘nix based system, but it should be roughly the same to do it on Windows (but leave a comment if it isn’t and I’ll work out why) Graph the connections between your Twitter friends This can actually be quite a difficult thing to do due the limit on the Twitter API and if you want to do it yourself then it is easier the fewer friends you have. If you have a large amount of friends then you might be best looking for a service to do it for you unless you can wait a long time to grab…

Sharing ideas in a distributed organisation

A good thing about working in JISC CETIS is being surrounded by the wide array of interests and ideas of its staff. A bad thing about working for JISC CETIS is with its distributed nature (and the fact everybody is always so busy!) it is always not possible to sit down and have a good natter about these interests. Sheila recently blogged about social analytics and the way people share things. I enjoyed the post as I find resource sharing online a really interesting area. I increasingly find myself getting anxious about how I share things online and to which online persona ideas and resources are attached. I find myself carving out an online identity created of different levels of obscurity where I push my outputs up the levels as and when I feel more comfortable with them. I find it interesting that Christopher Poole’s latest social network allows you…

What are we writing about? Using CETIS Publications RSS in R

I have been poking around Adam Coopers text mining weak signals in R code, and being too lazy to collect data in CSV format wondered if I could come up with something similar that used RSS feeds. I discovered it was really easy to read and start to mine RSS feeds in R, but there didn’t seem to be much help available on the web so I thought I’d share my findings. My test case was the new CETIS publications site, Phil has blogged about how the underlying technology behind site is wordpress, which means it has an easy to find feed. I wrote a very small script to test things out that looks something like this: library(“XML”) library(“tm”) doc<-xmlTreeParse("http://publications.cetis.ac.uk/feed") src<-xpathApply(xmlRoot(doc), "//category") tags<- NULL for (i in 1:length(src)) { tags<- rbind(tags,data.frame(tag=tag<-xmlSApply(src[[i]], xmlValue)) ) } This simply grabs the feed and puts all the categories tags into a dataframe. I then…

SNA session at CETIS 12

I attended the SNA session at the CETIS conference hosted by Lorna, Sheila, Tony and Martin. Before the session I had blogged about some of the questions I had on SNA and although I think I have more new questions than answers I feel like things are much clearer now. My mind is still going over the conversations that were had at the session but these are the main themes and some early thoughts that I came away with. What are the quick wins? At the start of the session Sheila asked the question ‘What are the quick wins?’.   While Tony and Martins presentations were excellent I think it is hard for people who don’t have their head regularly in this space to replicate the techniques quickly. Lorna said that although she understood what was happening in the SNA examples there was some ‘secret magic that she couldn’t replicate…

Standards used in JISC programmes and projects over time

Today I took part in an introduction to R workshop being held at The University of Manchester. R is a software environment for statistics  and while it does all sorts of interesting things that are beyond my ability one thing that I can grasp and enjoy is exploring all the packages that are available for R, these packages extend Rs capabilities and let you do all sorts of cool things in a couple of lines of code. The target I set out for myself was to use JISC CETIS Project Directory data and find a way of visualising standards used in JISC funded projects and programmes over time. I found a Google Visualisation package and using this I was surprised at how easy it was to generate an output , the hardest bits being manipulating the data (and thinking about how to structure it).  Although my output from the day is incomplete I thought I’d write up…

Getting data out of PROD and its triplestore

For a while I have been wondering about the best way of creating a how-to guide around getting data out of the JISC CETIS project directory and in particular around its linked data triple store. A few weeks ago Martin Hawksy posted some great examples of work he’s been doing, including maps using data generated by PROD, I think these examples are great and thought that they would be a good starting point for a how to guide. Don’t be put off by scary terms as I think these things are relatively easy to do and I’ve left out as much technobabble as possible, The difficulty really lies with knowing both the location of various resources and some useful tricks. I’ve split the instructions into 3 steps. Getting data out PROD in a Google Spreadsheet Getting Institution, Long and Lat data out of PROD Mapping with Google maps. The  steps…

css.php