cleansing_corpus: Cleansing Corpus

Description Usage Arguments Value Examples

View source: R/cleansing_corpus.R

Description

The function performs text cleansing by removing escape characters, non alphanumeric, long-words, excess space, and turns all letters to lower case.

Usage

1
2
3
4
5
6
7
8
cleansing_corpus(
  text,
  escape_chars = TRUE,
  nonalphanum = TRUE,
  longwords = TRUE,
  whitespace = TRUE,
  tolower = TRUE
)

Arguments

text

Character vector of free text to be cleansed.

escape_chars

If TRUE, removes escape characters for slash n, slash r and slash t.

nonalphanum

If TRUE, removes non-alphanumeric characters.

longwords

If TRUE, removes words with more than 35 characters.

whitespace

If TRUE, removes excess whitespace.

tolower

If TRUE, turns letters to lower.

Value

A character vector of the cleansed text.

Examples

1
2
txt <- "It has roots in a piece of classical Latin literature from 45 BC"
cleansing_corpus(txt)

eworx-org/labourR documentation built on Feb. 10, 2022, 12:35 a.m.