nonword: Eliminate non-words
In UBESP-DCTV/costumer: COmprehensive Searches ThroUgh Machine learning for systEmatic Reviews

nonword

R Documentation

Eliminate non-words

Description

This function's aim is to eliminate everything is not an alphanumeric word/token from a corpora of documents. It also has an option to decide if numbers has to be removed too. Moreover, it is possible to override both the paramenter for the pattern identifying words and the one identifying the replacements (default is a white space).

Usage

nonword(corpus, numbers = FALSE, ..., pattern = NULL, replacement = " ")

## S3 method for class 'list'
nonword(corpus, numbers = FALSE, ..., pattern = NULL,
  replacement = " ")

## S3 method for class 'VCorpus'
nonword(corpus, numbers = FALSE, ..., pattern = NULL,
  replacement = " ")

## S3 method for class 'character'
nonword(corpus, numbers = FALSE, ..., pattern = NULL,
  replacement = " ")

## Default S3 method:
nonword(corpus, numbers = FALSE, ..., pattern = NULL,
  replacement = " ")

Arguments

`corpus`	a compatible object storing documents (actually, list (and corpus-list of (tokened) documents, character vectors and `VCorpus`)
`numbers`	(lgl) if TRUE also numbers are removed (default FALSE)
`...`	Additional option
`pattern`	(chr) an alternative regular expression used to remove (i.e., to substitute with `replacement`) everything that match it. Default is `NULL`. If not `NULL` the option numbers is ignored.
`replacement`	(chr) the string used to sobstitute the ones which will be eliminated. Default is `' '`.

Value

an object of the same class of input with documents witten with only "words" retained.

Examples

data(liu_corpus)

nonword('hell0 w.rld')
nonword('hell0 w.rld', numbers = TRUE)                  # remove also numbers
nonword('hell0 w.rld', replacement = '*')    # use "*" instead of white space
nonword('hell0 w.rld', pattern = 'w[^\\s]+')     # anithing starting with "w"

nonword(liu_corpus)$content[[1]]$content # "-" removed in "anti-angiogenesis"

UBESP-DCTV/costumer documentation built on Feb. 1, 2023, 4:52 a.m.

UBESP-DCTV/costumer index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

UBESP-DCTV/costumer
COmprehensive Searches ThroUgh Machine learning for systEmatic Reviews

nonword: Eliminate non-words
In UBESP-DCTV/costumer: COmprehensive Searches ThroUgh Machine learning for systEmatic Reviews

Eliminate non-words

Description

Usage

Arguments

Value

Examples

Related to nonword in UBESP-DCTV/costumer...

R Package Documentation

Browse R Packages

We want your feedback!

UBESP-DCTV/costumer COmprehensive Searches ThroUgh Machine learning for systEmatic Reviews

nonword: Eliminate non-words In UBESP-DCTV/costumer: COmprehensive Searches ThroUgh Machine learning for systEmatic Reviews

Eliminate non-words

Description

Usage

Arguments

Value

Examples

Related to nonword in UBESP-DCTV/costumer...

R Package Documentation

Browse R Packages

We want your feedback!

UBESP-DCTV/costumer
COmprehensive Searches ThroUgh Machine learning for systEmatic Reviews

nonword: Eliminate non-words
In UBESP-DCTV/costumer: COmprehensive Searches ThroUgh Machine learning for systEmatic Reviews