| as.list.tokens | R Documentation |
Coercion functions to and from tokens objects, checks for whether an object is a tokens object, and functions to combine tokens objects.
## S3 method for class 'tokens'
as.list(x, ...)
## S3 method for class 'tokens'
as.character(x, use.names = FALSE, ...)
is.tokens(x)
as.tensor(x, ...)
## S3 method for class 'tokens'
as.tensor(x, length = NULL, ...)
as.tokens(x, concatenator = "_", ...)
## S3 method for class 'spacyr_parsed'
as.tokens(
x,
concatenator = "/",
include_pos = c("none", "pos", "tag"),
use_lemma = FALSE,
...
)
is.tokens(x)
x |
object to be coerced or checked |
... |
additional arguments used by specific methods. For c.tokens, these are the tokens objects to be concatenated. |
use.names |
logical; preserve names if |
length |
optional integer specifying the maximum length (number of
positions) for the sparse tensor. If |
concatenator |
character; the concatenation character that will connect the tokens making up a multi-token sequence. |
include_pos |
character; whether and which part-of-speech tag to use:
|
use_lemma |
logical; if |
The concatenator is used to automatically generate dictionary
values for multi-word expressions in tokens_lookup() and
dfm_lookup(). The underscore character is commonly used to join
elements of multi-word expressions (e.g. "piece_of_cake", "New_York"), but
other characters (e.g. whitespace " " or a hyphen "-") can also be used.
In those cases, users have to tell the system what is the concatenator in
your tokens so that the conversion knows to treat this character as the
inter-word delimiter, when reading in the elements that will become the
tokens.
as.list returns a simple list of characters from a
tokens object.
as.character returns a character vector from a
tokens object.
is.tokens returns TRUE if the object is of class
tokens, FALSE otherwise.
as.tensor returns a sparse COO tensor from a tokens object,
compatible with the torch package. Each document is represented as
a row, and token positions as columns. Values are the integer token IDs.
as.tokens returns a quanteda tokens object.
is.tokens returns TRUE if the object is of class
tokens, FALSE otherwise.
## Not run:
library(torch)
toks <- tokens(c(doc1 = "a b c d e f g",
doc2 = "a b c g",
doc3 = ""))
as.tensor(toks)
## End(Not run)
# create tokens object from list of characters with custom concatenator
dict <- dictionary(list(country = "United States",
sea = c("Atlantic Ocean", "Pacific Ocean")))
lis <- list(c("The", "United-States", "has", "the", "Atlantic-Ocean",
"and", "the", "Pacific-Ocean", "."))
toks <- as.tokens(lis, concatenator = "-")
tokens_lookup(toks, dict)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.