If you want to grab comments from one post, the Graph API Explorer might be easier.
I’ve been trying to find ways to poke the Facebook graph that will be easy for people who find working with the API directly difficult and currently I am using a tool called Facepager to extract data. After a few minutes working with the program it seems easy enough to get it to poke the Facebook Graph for the comments and then store them in a SQLite database. I’ll be looking to work with the results in R… but one thing at a time!
1) After downloading and extracting Facepager for your system you should see a screen like the one in the image. Click the new database button and give it a filename. This will be your SQLite database, I’ll be using this in R in other posts, but for now you can just reload it it in Facepager to do stuff.
2)Then you want to click add node. The node name is the name of the Facebook page you want to explore. For example if you want to get all the comments from posts on the Minecraft page at: https://www.facebook.com/minecraft then the node name is minecraft.
3)Select what you are after in the ‘Resources’ Tab, I went with <page>/posts.
4)You need an access token to get any data out of Facebook, press the login to Facebook button and log in.
5)Press fetch data.
Easy, now you have all the results viewable in Facepager, each post only shows the first 25 comments though and I want them all. To find them all I just clicked each individual node and changed the resource to post/comments. A video on how I did it:
I have been asked in the comments to share my R code. I’m not sure I still have the script in the state it was in during the video, but I did find my finished thing. This script takes all the comments I have scraped from Facebook (from political party pages) and then it finds repeated phrases. If you would like to see what I did with it then you can read this post about it here. Hope it helps.
mydata = read.csv("5partycomments", sep = ";") # read csv file
corpus <- Corpus(VectorSource(mydata$message)) # create corpus object
corpus <- tm_map(corpus, mc.cores=1, removePunctuation)
corpus <- tm_map(corpus, removeNumbers, mc.cores=1)
corpus <- tm_map(corpus, removeWords, stopwords("english"), mc.cores=1)
# convert all text to lower case
corpus <- tm_map(corpus, tolower, mc.cores=1)
#proplem with to lower means we need to make it type of plain text document again
corpus <- tm_map(corpus, PlainTextDocument)
#make the term document matrix
tdm <- TermDocumentMatrix(corpus)
#find the frequent terms
findFreqTerms(tdm, lowfreq = 500)
#tokenizer for tdm with ngrams
BigramTokenizer <- function(x) NGramTokenizer(x, Weka_control(min = 6, max = 6))
tdm <- TermDocumentMatrix(corpus, control = list(tokenize = BigramTokenizer))
findFreqTerms(tdm, lowfreq = 15)
#create dataframe and order by most used
rollup <- rollup(tdm, 2, na.rm=TRUE, FUN = sum)
mydata.df <- as.data.frame(inspect(rollup))
colnames(mydata.df) <- c("count")
mydata.df$ngram <- rownames(mydata.df)
newdata <- mydata.df[order(-count),]
newdata<-newdata[order(newdata$count, decreasing=TRUE), ]