readTagged-methods: Import already tagged texts
In koRpus: Text Analysis with Emphasis on POS Tagging, Readability, and Lexical Diversity

Description Usage Arguments Details Value References See Also Examples

This method can be used on text files or matrices containing already tagged text material, e.g. the results of TreeTagger[1].

readTagged(file, ...)

## S4 method for signature 'matrix'
readTagged(
  file,
  lang = "kRp.env",
  tagger = "TreeTagger",
  apply.sentc.end = TRUE,
  sentc.end = c(".", "!", "?", ";", ":"),
  stopwords = NULL,
  stemmer = NULL,
  rm.sgml = TRUE,
  doc_id = NA,
  add.desc = "kRp.env",
  mtx_cols = c(token = "token", tag = "tag", lemma = "lemma")
)

## S4 method for signature 'data.frame'
readTagged(
  file,
  lang = "kRp.env",
  tagger = "TreeTagger",
  apply.sentc.end = TRUE,
  sentc.end = c(".", "!", "?", ";", ":"),
  stopwords = NULL,
  stemmer = NULL,
  rm.sgml = TRUE,
  doc_id = NA,
  add.desc = "kRp.env",
  mtx_cols = c(token = "token", tag = "tag", lemma = "lemma")
)

## S4 method for signature 'kRp.connection'
readTagged(
  file,
  lang = "kRp.env",
  encoding = getOption("encoding"),
  tagger = "TreeTagger",
  apply.sentc.end = TRUE,
  sentc.end = c(".", "!", "?", ";", ":"),
  stopwords = NULL,
  stemmer = NULL,
  rm.sgml = TRUE,
  doc_id = NA,
  add.desc = "kRp.env"
)

## S4 method for signature 'character'
readTagged(
  file,
  lang = "kRp.env",
  encoding = getOption("encoding"),
  tagger = "TreeTagger",
  apply.sentc.end = TRUE,
  sentc.end = c(".", "!", "?", ";", ":"),
  stopwords = NULL,
  stemmer = NULL,
  rm.sgml = TRUE,
  doc_id = NA,
  add.desc = "kRp.env"
)

`file`	Either a matrix, a connection or a character vector. If the latter, that must be a valid path to a file, containing the previously analyzed text. If it is a matrix, it must contain three columns named "token", "tag", and "lemma", and except for these three columns all others are ignored.
`...`	Additional options, currently unused.
`lang`	A character string naming the language of the analyzed corpus. See `kRp.POS.tags` for all supported languages. If set to `"kRp.env"` this is got from `get.kRp.env`.
`tagger`	The software which was used to tokenize and tag the text. Currently, "TreeTagger" and "manual" are the only supported values. If "manual", you must also adjust the values of `mtx_cols` to define the columns to be imported.
`apply.sentc.end`	Logical, whethter the tokens defined in `sentc.end` should be searched and set to a sentence ending tag. You could call this a compatibility mode to make sure you get the results you would get if you called `treetag` on the original file. If set to `FALSE`, the tags will be imported as they are.
`sentc.end`	A character vector with tokens indicating a sentence ending. This adds to given results, it doesn't replace them.
`stopwords`	A character vector to be used for stopword detection. Comparison is done in lower case. You can also simply set `stopwords=tm::stopwords("en")` to use the english stopwords provided by the `tm` package.
`stemmer`	A function or method to perform stemming. For instance, you can set `stemmer=Snowball::SnowballStemmer` if you have the `Snowball` package installed (or `SnowballC::wordStem`). As of now, you cannot provide further arguments to this function.
`rm.sgml`	Logical, whether SGML tags should be ignored and removed from output.
`doc_id`	Character string, optional identifier of the particular document. Will be added to the `desc` slot.
`add.desc`	Logical. If `TRUE`, the tag description (column `"desc"` of the data.frame) will be added directly to the resulting object. If set to `"kRp.env"` this is fetched from `get.kRp.env`. Only needed if `tag=TRUE`.
`mtx_cols`	Character vector with exactly three elements named "token", "tag", and "lemma", the values of which must match the respective column names of the matrix provided via `file`. It is possible to set `lemma=NA` if the tagged results only provide token and tag. This argument is ignored unless `tagger="manual"` and data is provided as either a matrix or data frame.
`encoding`	A character string defining the character encoding of the input file, like `"Latin1"` or `"UTF-8"`.

Note that the value of lang must match a valid language supported by kRp.POS.tags. It will also get stored in the resulting object and might be used by other functions at a later point.

An object of class kRp.text. If debug=TRUE, prints internal variable settings and attempts to return the original output if the TreeTagger system call in a matrix.

Schmid, H. (1994). Probabilistic part-of-speec tagging using decision trees. In International Conference on New Methods in Language Processing, Manchester, UK, 44–49.

[1] https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/

treetag, freq.analysis, get.kRp.env, kRp.text

## Not run: 
  # call method on a connection
  text_con <- file("~/my.data/tagged_speech.txt", "r")
  tagged_results <- readTagged(text_con, lang="en")
  close(text_con)

  # call it on the file directly
  tagged_results <- readTagged("~/my.data/tagged_speech.txt", lang="en")
  
  # import the results of RDRPOSTagger, using the "manual" tagger feature
  sample_text <- c("Dies ist ein kurzes Beispiel. Es ergibt wenig Sinn.")
  tagger <- RDRPOSTagger::rdr_model(language="German", annotation="POS")
  tagged_rdr <- RDRPOSTagger::rdr_pos(tagger, x=sample_text)
  tagged_results <- readTagged(
    tagged_rdr,
    lang="de",
    tagger="manual",
    mtx_cols=c(token="token", tag="pos", lemma=NA)
  )

## End(Not run)

koRpus documentation built on May 18, 2021, 1:13 a.m.

koRpus index

Package overview README.md Using the koRpus Package for Text Analysis

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

koRpus
Text Analysis with Emphasis on POS Tagging, Readability, and Lexical Diversity

readTagged-methods: Import already tagged texts
In koRpus: Text Analysis with Emphasis on POS Tagging, Readability, and Lexical Diversity

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Related to readTagged-methods in koRpus...

R Package Documentation

Browse R Packages

We want your feedback!

koRpus Text Analysis with Emphasis on POS Tagging, Readability, and Lexical Diversity

readTagged-methods: Import already tagged texts In koRpus: Text Analysis with Emphasis on POS Tagging, Readability, and Lexical Diversity

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Related to readTagged-methods in koRpus...

R Package Documentation

Browse R Packages

We want your feedback!

koRpus
Text Analysis with Emphasis on POS Tagging, Readability, and Lexical Diversity

readTagged-methods: Import already tagged texts
In koRpus: Text Analysis with Emphasis on POS Tagging, Readability, and Lexical Diversity