Pretreatment: Pretreatment of textual documents for NLP.
In LilRhino: For Implementation of Feed Reduction, Learning Examples, NLP and Code Management

Pretreatment

R Documentation

Pretreatment of textual documents for NLP.

Description

This function goes through a number of pretreatment steps in preparation for vectorization. These steps are designed to help the data become more standard so that there are fewer outliers when training during NLP. The following effects are applied: 1. Non-alpha/numerics are removed. 2. Numbers are separated from letters. 3. Numbers are replaced with their word equivalents. 4. Words are stemmed (optional). 5. Words are lowercased (optinal).

Usage

Pretreatment(title_vec, stem = TRUE, lower = TRUE, parallel = FALSE)

Arguments

`title_vec`	Vector of documents to be pre-treated.
`stem`	Boolian variable to decide whether to stem or not.
`lower`	Boolian variable to decide whether to lowercase words or not.
`parallel`	Boolian variable to decide whether to run this function in parallel or not.

Details

This function returns a list. It should be able to accept any format that the function lapply would accept. The parallelization is done with the function Mcapply from the package 'parallel' and will only work on systems that allow forking (Sorry windows users). Future updates will allow for socketing.

Value

output

The list of character strings post-pretreatment

Author(s)

Travis Barton

Examples

## Not run:  # for some reason it takes longer than 5 seconds on CRAN's computers
test_vec = c('This is a test', 'Ahoy!', 'my battle-ship is on... b6!')
res = Pretreatment(test_vec)
print(res)

## End(Not run)

LilRhino documentation built on April 28, 2022, 1:06 a.m.

LilRhino index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

LilRhino
For Implementation of Feed Reduction, Learning Examples, NLP and Code Management

Pretreatment: Pretreatment of textual documents for NLP.
In LilRhino: For Implementation of Feed Reduction, Learning Examples, NLP and Code Management

Pretreatment of textual documents for NLP.

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Related to Pretreatment in LilRhino...

R Package Documentation

Browse R Packages

We want your feedback!

LilRhino For Implementation of Feed Reduction, Learning Examples, NLP and Code Management

Pretreatment: Pretreatment of textual documents for NLP. In LilRhino: For Implementation of Feed Reduction, Learning Examples, NLP and Code Management

Pretreatment of textual documents for NLP.

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Related to Pretreatment in LilRhino...

R Package Documentation

Browse R Packages

We want your feedback!

LilRhino
For Implementation of Feed Reduction, Learning Examples, NLP and Code Management

Pretreatment: Pretreatment of textual documents for NLP.
In LilRhino: For Implementation of Feed Reduction, Learning Examples, NLP and Code Management