spacy_parse | R Documentation |
The spacy_parse()
function calls spaCy to both tokenize and tag the
texts, and returns a data.table of the results. The function provides options
on the types of tagsets (tagset_
options) either "google"
or
"detailed"
, as well as lemmatization (lemma
). It provides a
functionalities of dependency parsing and named entity recognition as an
option. If "full_parse = TRUE"
is provided, the function returns the
most extensive list of the parsing results from spaCy.
spacy_parse(
x,
pos = TRUE,
tag = FALSE,
lemma = TRUE,
entity = TRUE,
dependency = FALSE,
nounphrase = FALSE,
multithread = TRUE,
additional_attributes = NULL,
...
)
x |
a character object, a quanteda corpus, or a TIF-compliant corpus data.frame (see https://github.com/ropenscilabs/tif) |
pos |
logical whether to return universal dependency POS tagset https://universaldependencies.org/u/pos/) |
tag |
logical whether to return detailed part-of-speech tags, for the
language model |
lemma |
logical; include lemmatized tokens in the output (lemmatization may not work properly for non-English models) |
entity |
logical; if |
dependency |
logical; if |
nounphrase |
logical; if |
multithread |
logical; If |
additional_attributes |
a character vector; this option is for
extracting additional attributes of tokens from spaCy. When the names of
attributes are supplied, the output data.frame will contain additional
variables corresponding to the names of the attributes. For instance, when
|
... |
not used directly |
a data.frame
of tokenized, parsed, and annotated tokens
## Not run:
spacy_initialize()
# See Chap 5.1 of the NLTK book, http://www.nltk.org/book/ch05.html
txt <- "And now for something completely different."
spacy_parse(txt)
spacy_parse(txt, pos = TRUE, tag = TRUE)
spacy_parse(txt, dependency = TRUE)
txt2 <- c(doc1 = "The fast cat catches mice.\\nThe quick brown dog jumped.",
doc2 = "This is the second document.",
doc3 = "This is a \\\"quoted\\\" text." )
spacy_parse(txt2, entity = TRUE, dependency = TRUE)
txt3 <- "We analyzed the Supreme Court with three natural language processing tools."
spacy_parse(txt3, entity = TRUE, nounphrase = TRUE)
spacy_parse(txt3, additional_attributes = c("like_num", "is_punct"))
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.