cleaner: Clean textual data

Description Usage Arguments Value Examples

View source: R/cleaner.r

Description

Given a text vector, clean text is returned

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
cleaner(
  rawtext,
  stopwords = "italian",
  delete_url = TRUE,
  lowercase = TRUE,
  simbols = TRUE,
  numbers = TRUE,
  smallwords = TRUE,
  spaces = TRUE
)

Arguments

rawtext

texts to clean

stopwords

stopwords language. Default to "italian". It supports the following languages: danish, dutch, english, finnish, french, german, hungarian, italian, norwegian, portuguese, russian, spanish, and swedish

delete_url

delete URLs in the texts (default TRUE)

lowercase

convert words to lower (default TRUE)

simbols

keep just alphanumeric characters (default TRUE).

numbers

remove numbers (default TRUE).

smallwords

remove words composed of up to 2 characters (default TRUE).

spaces

remove extra white spaces (default TRUE).

Value

a text vector with cleaned textual data

Examples

1
2
3
## Not run: 
clean_text <- cleaner(rawtext)
## End(Not run)

nicolarighetti/textools documentation built on Oct. 16, 2021, 11:20 p.m.