Word-clouds-in-r

Word-clouds-in-r


Last edit: Victoria Burton (13 Dec 2016 14:35) | Revisions: 1 | Created by: Victoria Burton | Rating: 0


Creating a word cloud using R

Install the packages tm and wordcloud
tm provides tools for text mining, wordcloud to create the figure

load in your spreadsheet or text document of words in your preferred way, with stringsAsFactors = FALSE

Create a corpus, this extracts the words to use in the word cloud

corpus <- Corpus(VectorSource(dat))

Strip white space from the words by using the following command:

corpus <- tm_map(corpus , stripWhitespace)

You can remove punctuation, stop words (these are commonly used words), numbers and any other words you specific using the following commands.

To remove punctuation:

corpus <- tm_map(corpus , removePunctuation)

To remove English stopwords:

corpus <- tm_map(corpus , removeWords, stopwords('english'))

To remove words of your choice:

corpus <- tm_map(corpus , removeWords, c('word1','word2'))

To remove numbers:

corpus<- tm_map(corpus, removeNumbers)

You can also keep words together if you do not want them apart. This code loops over the corpus and uses gsub to replace strings of words.

for (j in seq(corpus))
{
  corpus[[j]] <- gsub("citizen science", "citizen_science", corpus[[j]])
}

rating: 0+x
Add a New Comment
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License