Description Usage Arguments Details References Examples
These methods should be used to get or set values of tagged text objects
generated by koRpus functions like treetag
or tokenize
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 | taggedText(obj, add.desc = FALSE, doc_id = FALSE)
## S4 method for signature 'kRp.text'
taggedText(obj, add.desc = FALSE, doc_id = FALSE)
taggedText(obj) <- value
## S4 replacement method for signature 'kRp.text'
taggedText(obj) <- value
doc_id(obj, ...)
## S4 method for signature 'kRp.text'
doc_id(obj, has_id = NULL)
hasFeature(obj, feature = NULL, ...)
## S4 method for signature 'kRp.text'
hasFeature(obj, feature = NULL)
hasFeature(obj, feature) <- value
## S4 replacement method for signature 'kRp.text'
hasFeature(obj, feature) <- value
feature(obj, feature, ...)
## S4 method for signature 'kRp.text'
feature(obj, feature, doc_id = NULL)
feature(obj, feature) <- value
## S4 replacement method for signature 'kRp.text'
feature(obj, feature) <- value
corpusReadability(obj, ...)
## S4 method for signature 'kRp.text'
corpusReadability(obj, doc_id = NULL)
corpusReadability(obj) <- value
## S4 replacement method for signature 'kRp.text'
corpusReadability(obj) <- value
corpusHyphen(obj, ...)
## S4 method for signature 'kRp.text'
corpusHyphen(obj, doc_id = NULL)
corpusHyphen(obj) <- value
## S4 replacement method for signature 'kRp.text'
corpusHyphen(obj) <- value
corpusLexDiv(obj, ...)
## S4 method for signature 'kRp.text'
corpusLexDiv(obj, doc_id = NULL)
corpusLexDiv(obj) <- value
## S4 replacement method for signature 'kRp.text'
corpusLexDiv(obj) <- value
corpusFreq(obj, ...)
## S4 method for signature 'kRp.text'
corpusFreq(obj)
corpusFreq(obj) <- value
## S4 replacement method for signature 'kRp.text'
corpusFreq(obj) <- value
corpusCorpFreq(obj, ...)
## S4 method for signature 'kRp.text'
corpusCorpFreq(obj)
corpusCorpFreq(obj) <- value
## S4 replacement method for signature 'kRp.text'
corpusCorpFreq(obj) <- value
corpusStopwords(obj, ...)
## S4 method for signature 'kRp.text'
corpusStopwords(obj)
corpusStopwords(obj) <- value
## S4 replacement method for signature 'kRp.text'
corpusStopwords(obj) <- value
## S4 method for signature 'kRp.text,ANY,ANY,ANY'
x[i, j, ..., drop = TRUE]
## S4 replacement method for signature 'kRp.text,ANY,ANY,ANY'
x[i, j, ...] <- value
## S4 method for signature 'kRp.text'
x[[i, doc_id = NULL, ...]]
## S4 replacement method for signature 'kRp.text'
x[[i, doc_id = NULL, ...]] <- value
## S4 method for signature 'kRp.text'
describe(obj, doc_id = NULL, simplify = TRUE, ...)
## S4 replacement method for signature 'kRp.text'
describe(obj, doc_id = NULL, ...) <- value
## S4 method for signature 'kRp.text'
language(obj)
## S4 replacement method for signature 'kRp.text'
language(obj) <- value
diffText(obj, doc_id = NULL)
## S4 method for signature 'kRp.text'
diffText(obj, doc_id = NULL)
diffText(obj) <- value
## S4 replacement method for signature 'kRp.text'
diffText(obj) <- value
originalText(obj)
## S4 method for signature 'kRp.text'
originalText(obj)
is.taggedText(obj)
is.kRp.text(obj)
fixObject(obj, doc_id = NA)
## S4 method for signature 'kRp.text'
fixObject(obj, doc_id = NA)
tif_as_tokens_df(tokens)
## S4 method for signature 'kRp.text'
tif_as_tokens_df(tokens)
## S4 method for signature 'kRp.tagged'
fixObject(obj, doc_id = NA)
## S4 method for signature 'kRp.txt.freq'
fixObject(obj, doc_id = NA)
## S4 method for signature 'kRp.txt.trans'
fixObject(obj, doc_id = NA)
## S4 method for signature 'kRp.analysis'
fixObject(obj, doc_id = NA)
|
obj |
An arbitrary |
add.desc |
Logical,
determines whether the |
doc_id |
Logical (except for |
value |
The new value to replace the current with. |
... |
Additional arguments for the generics. |
has_id |
A character vector with |
feature |
Character string naming the feature to look for. The return value is logical if a single feature
name is given. If |
x |
An object of class |
i |
Defines the row selector ( |
j |
Defines the column selector. |
drop |
Logical,
whether the result should be coerced to the lowest possible dimension. See |
simplify |
Logical, if |
tokens |
An object of class |
taggedText()
returns the tokens
slot.
doc_id()
Returns a character vector of all doc_id
values in the object.
describe()
returns the desc
slot.
language()
returns the lang
slot.
[
/[[
Can be used as a shortcut to index the results of taggedText()
.
fixObject
returns the same object upgraded to the object structure of this package version (e.g.,
new columns, changed names, etc.).
hasFeature()
returns TRUE
or codeFALSE,
depending on whether the requested feature is present or not.
feature()
returns the list entry of the feat_list
slot for the requested feature.
corpusReadability()
returns the list of kRp.readability
objects,
see readability
.
corpusHyphen()
returns the list of kRp.hyphen
objects,
see hyphen
.
corpusLexDiv()
returns the list of kRp.TTR
objects,
see lex.div
.
corpusFreq()
returns the frequency analysis data from the feat_list
slot,
see freq.analysis
.
corpusCorpFreq()
returns the kRp.corp.freq
object of the feat_list
slot,
see for example read.corp.custom
.
corpusStopwords()
returns the number of stopwords found in each text (if analyzed) from the feat_list
slot.
tif_as_tokens_df
returns the tokens
slot in a TIF[1] compliant format,
i.e., doc_id
is not a factor but a character vector.
originalText()
similar to taggedText()
,
but reverts any transformations back to the original text before returning the tokens
slot.
Only works if the object has the feature diff
, see examples.
diffText()
returns the diff
slot, if present.
[1] Text Interchange Formats (https://github.com/ropensci/tif)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | # code is only run when the english language package can be loaded
if(require("koRpus.lang.en", quietly = TRUE)){
sample_file <- file.path(
path.package("koRpus"), "examples", "corpus", "Reality_Winner.txt"
)
tokenized.obj <- tokenize(
txt=sample_file,
lang="en"
)
doc_id(tokenized.obj)
describe(tokenized.obj)
language(tokenized.obj)
taggedText(tokenized.obj)
tokenized.obj[["token"]]
tokenized.obj[1:3, "token"]
tif_as_tokens_df(tokenized.obj)
# example for originalText()
tokenized.obj <- jumbleWords(tokenized.obj)
# now compare the jumbled words to the original
tokenized.obj[["token"]]
originalText(tokenized.obj)[["token"]]
} else {}
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.