cleansing_corpus: Cleansing Corpus

Description Usage Arguments Value Examples

View source: R/cleansing_corpus.R

Description

The function performs text cleansing by removing escape characters, non alphanumeric, long-words, excess space, and turns all letters to lower case.

Usage

1
2
3
4
5
6
7
8
cleansing_corpus(
  text,
  escape_chars = TRUE,
  nonalphanum = TRUE,
  longwords = TRUE,
  whitespace = TRUE,
  tolower = TRUE
)

Arguments

text

Character vector of free text to be cleansed.

escape_chars

If TRUE, removes escape characters for slash n, slash r and slash t.

nonalphanum

If TRUE, removes non-alphanumeric characters.

longwords

If TRUE, removes words with more than 35 characters.

whitespace

If TRUE, removes excess whitespace.

tolower

If TRUE, turns letters to lower.

Value

A character vector of the cleansed text.

Examples

1
2
txt <- "It has roots in a piece of classical Latin literature from 45 BC"
cleansing_corpus(txt)

labourR documentation built on July 18, 2020, 5:06 p.m.