Description Usage Arguments Details Value References See Also Examples
This method can be used on text files or matrices containing already tagged text material, e.g. the results of TreeTagger[1].
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 | readTagged(file, ...)
## S4 method for signature 'matrix'
readTagged(
file,
lang = "kRp.env",
tagger = "TreeTagger",
apply.sentc.end = TRUE,
sentc.end = c(".", "!", "?", ";", ":"),
stopwords = NULL,
stemmer = NULL,
rm.sgml = TRUE,
doc_id = NA,
add.desc = "kRp.env",
mtx_cols = c(token = "token", tag = "tag", lemma = "lemma")
)
## S4 method for signature 'data.frame'
readTagged(
file,
lang = "kRp.env",
tagger = "TreeTagger",
apply.sentc.end = TRUE,
sentc.end = c(".", "!", "?", ";", ":"),
stopwords = NULL,
stemmer = NULL,
rm.sgml = TRUE,
doc_id = NA,
add.desc = "kRp.env",
mtx_cols = c(token = "token", tag = "tag", lemma = "lemma")
)
## S4 method for signature 'kRp.connection'
readTagged(
file,
lang = "kRp.env",
encoding = getOption("encoding"),
tagger = "TreeTagger",
apply.sentc.end = TRUE,
sentc.end = c(".", "!", "?", ";", ":"),
stopwords = NULL,
stemmer = NULL,
rm.sgml = TRUE,
doc_id = NA,
add.desc = "kRp.env"
)
## S4 method for signature 'character'
readTagged(
file,
lang = "kRp.env",
encoding = getOption("encoding"),
tagger = "TreeTagger",
apply.sentc.end = TRUE,
sentc.end = c(".", "!", "?", ";", ":"),
stopwords = NULL,
stemmer = NULL,
rm.sgml = TRUE,
doc_id = NA,
add.desc = "kRp.env"
)
|
file |
Either a matrix, a connection or a character vector. If the latter, that must be a valid path to a file, containing the previously analyzed text. If it is a matrix, it must contain three columns named "token", "tag", and "lemma", and except for these three columns all others are ignored. |
... |
Additional options, currently unused. |
lang |
A character string naming the language of the analyzed corpus. See |
tagger |
The software which was used to tokenize and tag the text. Currently,
"TreeTagger" and "manual" are the only
supported values. If "manual",
you must also adjust the values of |
apply.sentc.end |
Logical,
whethter the tokens defined in |
sentc.end |
A character vector with tokens indicating a sentence ending. This adds to given results, it doesn't replace them. |
stopwords |
A character vector to be used for stopword detection. Comparison is done in lower case. You can also simply set
|
stemmer |
A function or method to perform stemming. For instance,
you can set |
rm.sgml |
Logical, whether SGML tags should be ignored and removed from output. |
doc_id |
Character string,
optional identifier of the particular document. Will be added to the |
add.desc |
Logical. If |
mtx_cols |
Character vector with exactly three elements named "token", "tag",
and "lemma",
the values of which must match the respective column names of the matrix provided via |
encoding |
A character string defining the character encoding of the input file,
like |
Note that the value of lang
must match a valid language supported by kRp.POS.tags
.
It will also get stored in the resulting object and might be used by other functions at a later point.
An object of class kRp.text
. If debug=TRUE
,
prints internal variable settings and
attempts to return the original output if the TreeTagger system call in a matrix.
Schmid, H. (1994). Probabilistic part-of-speec tagging using decision trees. In International Conference on New Methods in Language Processing, Manchester, UK, 44–49.
[1] https://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/
treetag
,
freq.analysis
,
get.kRp.env
,
kRp.text
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | ## Not run:
# call method on a connection
text_con <- file("~/my.data/tagged_speech.txt", "r")
tagged_results <- readTagged(text_con, lang="en")
close(text_con)
# call it on the file directly
tagged_results <- readTagged("~/my.data/tagged_speech.txt", lang="en")
# import the results of RDRPOSTagger, using the "manual" tagger feature
sample_text <- c("Dies ist ein kurzes Beispiel. Es ergibt wenig Sinn.")
tagger <- RDRPOSTagger::rdr_model(language="German", annotation="POS")
tagged_rdr <- RDRPOSTagger::rdr_pos(tagger, x=sample_text)
tagged_results <- readTagged(
tagged_rdr,
lang="de",
tagger="manual",
mtx_cols=c(token="token", tag="pos", lemma=NA)
)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.