filterByClass-methods: Remove word classes

Description Usage Arguments Value See Also Examples

Description

This method strips off defined word classes of tagged text objects.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
filterByClass(txt, ...)

## S4 method for signature 'kRp.text'
filterByClass(
  txt,
  corp.rm.class = "nonpunct",
  corp.rm.tag = c(),
  as.vector = FALSE,
  update.desc = TRUE
)

Arguments

txt

An object of class kRp.text.

...

Additional options, currently unused.

corp.rm.class

A character vector with word classes which should be removed. The default value "nonpunct" has special meaning and will cause the result of kRp.POS.tags(lang, tags=c("punct","sentc"), list.classes=TRUE) to be used. Another valid value is "stopword" to remove all detected stopwords.

corp.rm.tag

A character vector with valid POS tags which should be removed.

as.vector

Logical. If TRUE, results will be returned as a character vector containing only the text parts which survived the filtering.

update.desc

Logical. If TRUE, the desc slot of the tagged object will be fully recalculated using the filtered text. If FALSE, the desc slot will be copied from the original object. Finally, if NULL, the desc slot remains empty.

Value

An object of the input class. If as.vector=TRUE, returns only a character vector.

See Also

kRp.POS.tags

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# code is only run when the english language package can be loaded
if(require("koRpus.lang.en", quietly = TRUE)){
  sample_file <- file.path(
    path.package("koRpus"), "examples", "corpus", "Reality_Winner.txt"
  )
  tokenized.obj <- tokenize(
    txt=sample_file,
    lang="en"
  )
  filterByClass(tokenized.obj)
} else {}

koRpus documentation built on May 18, 2021, 1:13 a.m.