clean_text: Clean text

Description Usage Arguments Details Value Author(s) Examples

Description

Pre-processing of raw text. It removes stop-words, punctuations, and create sentence markers.

Usage

1
clean_text(rawText,removeStopwords=F)

Arguments

rawText

A Vector of strings (tokens)

removeStopwords

A boolean: TRUE (remove stop words) - FALSE (it retains them)

Details

A convenience function that removes unwanted information from a vector of text. The user has, at the moment, an argument to choose whether to remove stop words.

Value

It returns the vector of text all in lower case, and stripped from punctuations and stop-words.

Author(s)

Rick Dale (rdale@ucla.edu)

Examples

1
2
3
4
5
6
7
8
9
library(gutenbergr)
## let's get Alice's Adventures in Wonderland by Carroll
# gutenberg_works(author == "Carroll, Lewis") 
rawText = gutenberg_download(11) ## take the text
rawText = as.vector(rawText$text) ## vectorize the text
rawText = paste(rawText, collapse = " ") ## collapse the text

cleanText = clean_text(rawText, removeStopwords = TRUE)
text      = cleanText$content

crqanlp documentation built on May 2, 2019, 1:09 p.m.