read.corp.custom-methods: Import custom corpus data
In unDocUMeantIt/koRpus: Text Analysis with Emphasis on POS Tagging, Readability, and Lexical Diversity

Description Usage Arguments Details Value See Also Examples

Read data from a custom corpus into a valid object of class kRp.corp.freq.

read.corp.custom(corpus, caseSens = TRUE, log.base = 10, ...)

## S4 method for signature 'kRp.text'
read.corp.custom(
  corpus,
  caseSens = TRUE,
  log.base = 10,
  dtm = docTermMatrix(obj = corpus, case.sens = caseSens),
  as.feature = FALSE
)

`corpus`	An object of class `kRp.text` (then the column `"token"` of the `tokens` slot is used).
`caseSens`	Logical. If `FALSE`, all tokens will be matched in their lower case form.
`log.base`	A numeric value defining the base of the logarithm used for inverse document frequency (idf). See `log` for details.
`...`	Additional options for methods of the generic.
`dtm`	A document term matrix of the `corpus` object as generated by `docTermMatrix`. This argument merely exists for cases where you want to re-use an already existing matrix. By default, it is being created from the `corpus` object.
`as.feature`	Logical, whether the output should be just the analysis results or the input object with the results added as a feature. Use `corpusCorpFreq` to get the results from such an aggregated object.

The methods should enable you to perform a basic text corpus frequency analysis. That is, not just to import analysis results like LCC files, but to import the corpus material itself. The resulting object is of class kRp.corp.freq, so it can be used for frequency analysis by other functions and methods of this package.

An object of class kRp.corp.freq.

Depending on as.feature, either an object of class kRp.corp.freq, or an object of class kRp.text with the added feature corp_freq containing it.

kRp.corp.freq

# code is only run when the english language package can be loaded
if(require("koRpus.lang.en", quietly = TRUE)){
  sample_file <- file.path(
    path.package("koRpus"), "examples", "corpus", "Reality_Winner.txt"
  )
  # call read.corp.custom() on a tokenized text
  tokenized.obj <- tokenize(
    txt=sample_file,
    lang="en"
  )
  # if you call read.corp.custom() without arguments,
  # you will get its results directly
  en_corp <- read.corp.custom(
    tokenized.obj,
    caseSens=FALSE
  )

  # alternatively, you can also store those results as a
  # feature in the object itself
  tokenized.obj <- read.corp.custom(
    tokenized.obj,
    caseSens=FALSE,
    as.feature=TRUE
  )
  # results are now part of the object
  hasFeature(tokenized.obj)
  corpusCorpFreq(tokenized.obj)
} else {}

unDocUMeantIt/koRpus documentation built on May 21, 2021, 9:26 p.m.

unDocUMeantIt/koRpus index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

unDocUMeantIt/koRpus
Text Analysis with Emphasis on POS Tagging, Readability, and Lexical Diversity

read.corp.custom-methods: Import custom corpus data
In unDocUMeantIt/koRpus: Text Analysis with Emphasis on POS Tagging, Readability, and Lexical Diversity

Description

Usage

Arguments

Details

Value

See Also

Examples

Related to read.corp.custom-methods in unDocUMeantIt/koRpus...

R Package Documentation

Browse R Packages

We want your feedback!

unDocUMeantIt/koRpus Text Analysis with Emphasis on POS Tagging, Readability, and Lexical Diversity

read.corp.custom-methods: Import custom corpus data In unDocUMeantIt/koRpus: Text Analysis with Emphasis on POS Tagging, Readability, and Lexical Diversity

Description

Usage

Arguments

Details

Value

See Also

Examples

Related to read.corp.custom-methods in unDocUMeantIt/koRpus...

R Package Documentation

Browse R Packages

We want your feedback!

unDocUMeantIt/koRpus
Text Analysis with Emphasis on POS Tagging, Readability, and Lexical Diversity

read.corp.custom-methods: Import custom corpus data
In unDocUMeantIt/koRpus: Text Analysis with Emphasis on POS Tagging, Readability, and Lexical Diversity