Description Usage Arguments Value Note See Also Examples
These methods return character vectors that return all types or tokens of a given text,
where text can either be a character
vector itself, a previosly tokenized/tagged koRpus object,
or an object of class kRp.TTR
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 | types(txt, ...)
tokens(txt, ...)
## S4 method for signature 'kRp.TTR'
types(txt, stats = FALSE)
## S4 method for signature 'kRp.TTR'
tokens(txt)
## S4 method for signature 'kRp.text'
types(
txt,
case.sens = FALSE,
lemmatize = FALSE,
corp.rm.class = "nonpunct",
corp.rm.tag = c(),
stats = FALSE
)
## S4 method for signature 'kRp.text'
tokens(
txt,
case.sens = FALSE,
lemmatize = FALSE,
corp.rm.class = "nonpunct",
corp.rm.tag = c()
)
## S4 method for signature 'character'
types(
txt,
case.sens = FALSE,
lemmatize = FALSE,
corp.rm.class = "nonpunct",
corp.rm.tag = c(),
stats = FALSE,
lang = NULL
)
## S4 method for signature 'character'
tokens(
txt,
case.sens = FALSE,
lemmatize = FALSE,
corp.rm.class = "nonpunct",
corp.rm.tag = c(),
lang = NULL
)
|
txt |
An object of either class |
... |
Only used for the method generic. |
stats |
Logical, whether statistics on the length in characters and frequency of types in the text should also be returned. |
case.sens |
Logical, whether types should be counted case sensitive. This option is available for tagged text and character input only. |
lemmatize |
Logical, whether analysis should be carried out on the lemmatized tokens rather than all running word forms. This option is available for tagged text and character input only. |
corp.rm.class |
A character vector with word classes which should be dropped. The default value
|
corp.rm.tag |
A character vector with POS tags which should be dropped. This option is available for tagged text and character input only. |
lang |
Set the language of a text,
see the |
A character vector. Fortypes
and stats=TRUE
a data.frame containing all types,
their length (characters)
and frequency. The types
result is always sorted by frequency,
with more frequent types coming first.
If the input is of class kRp.TTR
,
the result will only be useful if lex.div
or
the respective wrapper function was called with keep.tokens=TRUE
. Similarily,
lemmatize
can only work
properly if the input is a tagged text object with lemmata or you've properly set up the enviroment via set.kRp.env
.
Calling these methods on kRp.TTR
objects is just returning the respective part of its tt
slot.
kRp.POS.tags
,
kRp.text
,
kRp.TTR
,
lex.div
1 2 3 4 5 6 7 8 9 10 11 12 13 | # code is only run when the english language package can be loaded
if(require("koRpus.lang.en", quietly = TRUE)){
sample_file <- file.path(
path.package("koRpus"), "examples", "corpus", "Reality_Winner.txt"
)
tokenized.obj <- tokenize(
txt=sample_file,
lang="en"
)
types(tokenized.obj)
tokens(tokenized.obj)
} else {}
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.