stem.corpus: Step corpus with annotation.

Description Usage Arguments Details Examples

View source: R/stempp.R

Description

Given a tm-package VCorpus of original text, returns a VCorpus of stemmed text with '+' appended to all stemmed words.

Usage

1
stem.corpus(corpus, verbose = TRUE)

Arguments

corpus

Original text

verbose

True means print out text progress bar so you can watch progress.

Details

This is non-optimized code that is expensive to run. First the stemmer chops words. Then this method passes through and adds a "+" to all chopped words, and builds a list of stems. Finally, the method passes through and adds a "+" to all stems found without a suffix.

So, e.g., goblins and goblin will both be transformed to "goblin+".

Adding the '+' makes stemmed text more readible.

Code based on code from Kevin Wu, UC Berkeley Undergrad Thesis 2014.

Requires, via the tm package, the SnowballC package.

Warning: Do not use this on a textreg.corpus object. Do to text before building the textreg.corpus object.

Examples

1
2
3
4
5
6
7
 
library( tm )
texts <- c("texting goblins the dagger", "text these goblins", 
            "texting 3 goblins appl daggers goblining gobble")
corpus <- VCorpus(VectorSource(texts))
stemmed_corpus<-stem.corpus(corpus, verbose=FALSE)
inspect( stemmed_corpus[[2]] )

textreg documentation built on May 29, 2017, 12:25 p.m.

Search within the textreg package
Search all R packages, documentation and source code