types.tokens-methods: Get types and tokens of a given text
In unDocUMeantIt/koRpus: Text Analysis with Emphasis on POS Tagging, Readability, and Lexical Diversity

Description Usage Arguments Value Note See Also Examples

These methods return character vectors that return all types or tokens of a given text, where text can either be a character vector itself, a previosly tokenized/tagged koRpus object, or an object of class kRp.TTR.

types(txt, ...)

tokens(txt, ...)

## S4 method for signature 'kRp.TTR'
types(txt, stats = FALSE)

## S4 method for signature 'kRp.TTR'
tokens(txt)

## S4 method for signature 'kRp.text'
types(
  txt,
  case.sens = FALSE,
  lemmatize = FALSE,
  corp.rm.class = "nonpunct",
  corp.rm.tag = c(),
  stats = FALSE
)

## S4 method for signature 'kRp.text'
tokens(
  txt,
  case.sens = FALSE,
  lemmatize = FALSE,
  corp.rm.class = "nonpunct",
  corp.rm.tag = c()
)

## S4 method for signature 'character'
types(
  txt,
  case.sens = FALSE,
  lemmatize = FALSE,
  corp.rm.class = "nonpunct",
  corp.rm.tag = c(),
  stats = FALSE,
  lang = NULL
)

## S4 method for signature 'character'
tokens(
  txt,
  case.sens = FALSE,
  lemmatize = FALSE,
  corp.rm.class = "nonpunct",
  corp.rm.tag = c(),
  lang = NULL
)

`txt`	An object of either class `kRp.text` or `kRp.TTR`, or a character vector.
`...`	Only used for the method generic.
`stats`	Logical, whether statistics on the length in characters and frequency of types in the text should also be returned.
`case.sens`	Logical, whether types should be counted case sensitive. This option is available for tagged text and character input only.
`lemmatize`	Logical, whether analysis should be carried out on the lemmatized tokens rather than all running word forms. This option is available for tagged text and character input only.
`corp.rm.class`	A character vector with word classes which should be dropped. The default value `"nonpunct"` has special meaning and will cause the result of `kRp.POS.tags(lang, tags=c("punct","sentc"), list.classes=TRUE)` to be used. This option is available for tagged text and character input only.
`corp.rm.tag`	A character vector with POS tags which should be dropped. This option is available for tagged text and character input only.
`lang`	Set the language of a text, see the `force.lang` option of `lex.div`. This option is available for character input only.

A character vector. Fortypes and stats=TRUE a data.frame containing all types, their length (characters) and frequency. The types result is always sorted by frequency, with more frequent types coming first.

If the input is of class kRp.TTR, the result will only be useful if lex.div or the respective wrapper function was called with keep.tokens=TRUE. Similarily, lemmatize can only work properly if the input is a tagged text object with lemmata or you've properly set up the enviroment via set.kRp.env. Calling these methods on kRp.TTR objects is just returning the respective part of its tt slot.

kRp.POS.tags, kRp.text, kRp.TTR, lex.div

# code is only run when the english language package can be loaded
if(require("koRpus.lang.en", quietly = TRUE)){
  sample_file <- file.path(
    path.package("koRpus"), "examples", "corpus", "Reality_Winner.txt"
  )
  tokenized.obj <- tokenize(
    txt=sample_file,
    lang="en"
  )

  types(tokenized.obj)
  tokens(tokenized.obj)
} else {}

unDocUMeantIt/koRpus documentation built on May 21, 2021, 9:26 p.m.

unDocUMeantIt/koRpus index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

unDocUMeantIt/koRpus
Text Analysis with Emphasis on POS Tagging, Readability, and Lexical Diversity

types.tokens-methods: Get types and tokens of a given text
In unDocUMeantIt/koRpus: Text Analysis with Emphasis on POS Tagging, Readability, and Lexical Diversity

Description

Usage

Arguments

Value

Note

See Also

Examples

Related to types.tokens-methods in unDocUMeantIt/koRpus...

R Package Documentation

Browse R Packages

We want your feedback!

unDocUMeantIt/koRpus Text Analysis with Emphasis on POS Tagging, Readability, and Lexical Diversity

types.tokens-methods: Get types and tokens of a given text In unDocUMeantIt/koRpus: Text Analysis with Emphasis on POS Tagging, Readability, and Lexical Diversity

Description

Usage

Arguments

Value

Note

See Also

Examples

Related to types.tokens-methods in unDocUMeantIt/koRpus...

R Package Documentation

Browse R Packages

We want your feedback!

unDocUMeantIt/koRpus
Text Analysis with Emphasis on POS Tagging, Readability, and Lexical Diversity

types.tokens-methods: Get types and tokens of a given text
In unDocUMeantIt/koRpus: Text Analysis with Emphasis on POS Tagging, Readability, and Lexical Diversity