R/dropStop.R

Defines functions dropStop

Documented in dropStop

#' dropStop is a utility for removing certain words from text data
#' @param x character vector of strings to be cleaned
#' @param drop character vector of words to scrub
#' @param lower logical, if TRUE, x converted with \code{\link{tolower}}
#' @param splitby character, used with strsplit to tokenize \code{x}
#' @return a list with one element per input string, split by " ", with elements in \code{drop} removed
#' @examples
#' data(minicorpus)
#' minicorpus[1:3]
#' dropStop(minicorpus)[1:3]
#' @export
dropStop = function(x, drop, lower=TRUE, splitby=" ") {
 if (missing(drop)) {
    drop = ontoProc::stopWords
    }
 tx = force
 if (lower) tx = tolower
 spl = strsplit(tx(x), splitby)
 lapply(spl, setdiff, drop)
}

Try the ontoProc package in your browser

Any scripts or data that you put into this service are public.

ontoProc documentation built on Nov. 8, 2020, 4:49 p.m.