Description Usage Arguments Details Examples
Given a group of Chinese texts, this function manages to extract words of some specified types. For example, sometimes
you want to collect all verbs that are used in your texts. Note: this function uses jiebaR::tagging
to segment
texts and do pos-tagging. The types assigned are not all correct. So, alternatively, you can first pos-tag your texts with
other methods and then use this function.
1 2 3 4 5 6 7 8 9 10 11 | get_tag_word(
x,
tag = NULL,
tag_pattern = NULL,
mycutter = DEFAULT_cutter,
type = "word",
each = TRUE,
only_unique = FALSE,
keep_name = FALSE,
checks = TRUE
)
|
x |
it must be a list of character vectors, even when the list contains only one element.
Each element of the list is either a length 1 character vector of a text, or
a length >= 1 character vector which is the result of former tagging work. It should not contain |
tag |
one or more tags should be specified. Words with these tags will be chosen. Possible tags are "v", "n", "vn", etc. |
tag_pattern |
should be a length 1 regular expression. You can specify tags by this pattern rather than directly
provide tag names. For example, you can specify tag names starting with "n" by |
mycutter |
a cutter created with package jiebaR and
given by users to tag texts. If your texts have already been pos-tagged, you
can set this to |
type |
if it is "word" (default), then extract the words that match your tags. If it is "position", only the positions
of the words are returned. Note: if it is "positions", argument |
each |
if this is |
only_unique |
if it is |
keep_name |
whether to keep the tag names of the extracted words. The default is |
checks |
whether to check the correctness of arguments. The default is |
The Argument each and only_unique decide what kind of return you can get.
if each = TRUE
and only_unique = FALSE
, you can get a list, each element of which
contains words extracted. This is the default.
if each = TRUE
and only_unique = TRUE
, each element of the list only contains unique words.
if each = FALSE
and only_unique = FALSE
, all words extracted will be put into a single vector.
if each = FALSE
and only_unique = TRUE
, words extracted will be put into a single vector, but
only unique words will be returned.
1 2 3 4 5 6 7 8 9 10 11 | # No Chinese, so use English instead.
x1 <- c(v = "drink", xdrink = "coffee", v = "drink", xdrink = "cola", v = "eat", xfood = "banana")
x2 <- c(v = "drink", xdrink = "tea", v = "buy", x = "computer")
x <- list(x1, x2)
get_tag_word(x, tag = "v", mycutter = NULL)
get_tag_word(x, tag = "v", mycutter = NULL, only_unique = TRUE)
get_tag_word(x, tag_pattern = "^x", mycutter = NULL)
get_tag_word(x, tag_pattern = "^x", mycutter = NULL, keep_name = TRUE)
get_tag_word(x, tag = "v", mycutter = NULL, each = FALSE)
get_tag_word(x, tag = "v", mycutter = NULL, each = FALSE, only_unique = TRUE)
get_tag_word(x, tag = "v", mycutter = NULL, type = "position")
|
CHECKING ARGUMENTS
EXTRACTING BY TAG
DONE
[[1]]
[1] "drink" "drink" "eat"
[[2]]
[1] "drink" "buy"
CHECKING ARGUMENTS
EXTRACTING BY TAG
DONE
[[1]]
[1] "drink" "eat"
[[2]]
[1] "drink" "buy"
CHECKING ARGUMENTS
EXTRACTING BY TAG PATTERN
DONE
[[1]]
[1] "coffee" "cola" "banana"
[[2]]
[1] "tea" "computer"
CHECKING ARGUMENTS
EXTRACTING BY TAG PATTERN
DONE
[[1]]
xdrink xdrink xfood
"coffee" "cola" "banana"
[[2]]
xdrink x
"tea" "computer"
CHECKING ARGUMENTS
EXTRACTING BY TAG
DONE
[1] "drink" "drink" "eat" "drink" "buy"
CHECKING ARGUMENTS
EXTRACTING BY TAG
DONE
[1] "drink" "eat" "buy"
CHECKING ARGUMENTS
EXTRACTING BY TAG
DONE
[[1]]
[1] 1 3 5
[[2]]
[1] 1 3
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.