iclean: Clean document texts

Description Usage Arguments Value

View source: R/scripts.R

Description

Clean document texts

Usage

1
2
3
4
5
iclean(text, lowercase = TRUE, replacechars = c(j = "i", v = "u"),
  removeangbracketed = TRUE, removeparensed = TRUE,
  onlycharacter = TRUE, removepunctuation = FALSE,
  removestopwords = FALSE, language = "lat", customstopwords = NULL,
  removenumbers = FALSE, removeextraspaces = TRUE)

Arguments

text

vector of character strings

lowercase

logical: whether to convert to lower case

replacechars

names vector showing how to substitute characters to normalize spelling

removeangbracketed

logical: whether to remove text between angle brackets

removeparensed

logical: whether to remove text in parentheses

onlycharacter

logical: whether to strip all but alphanumeric characters

removepunctuation

logical: whether to remove punctuation (matters only if onlycharacter=FALSE)

removestopwords

logical: whether to remove stopwords. CURRENTLY HAS NO EFFECT.

language

logical: what language stopwords to use. CURRENTLY HAS NO EFFECT.

customstopwords

a vector of stopwords to remove.

removenumbers

logical, whether to remove numeric characters.

removeextraspaces

logical, whether to trim extra spaces.

Value

a vector of character strings, the clean version of text.


rushkin/parseR documentation built on May 17, 2019, 12:52 p.m.