Description Usage Arguments Value Examples
Extracts phrases from a list of POS tagged document using the "FilterFSA" method in Handler et al. 2016.
1 2 3 | extract_phrases(POS_tagged_documents, regex = "(A|N)*N(PD*(A|N)*N)*",
maximum_ngram_length = 8, minimum_ngram_length = 2,
return_phrase_vectors = TRUE, return_tag_sequences = FALSE)
|
POS_tagged_documents |
A list object of the form produced by the 'POS_tag_documents()' function, with either Penn TreeBank or Petrov/Gimpel style tags. |
regex |
The regular expression used to find phrases. Defaults to "(A|N)*N(PD*(A|N)*N)*", the "SimpleNP" grammar in Handler et al. 2016. A vector of regular expressions may also be provided if the user wishes to match more than one. |
maximum_ngram_length |
The maximum length phrases returned. Defaults to 8. Increasing this number can greatly increase runtime. |
minimum_ngram_length |
The minimum length phrases returned. Defaults to 2. Can be increased to remove shorter phrases, or decreased to include unigrams. |
return_phrase_vectors |
Logical indicating whether a list of phrase vectors (with each entry contain a vector of phrases in one document) should be returned, or whether phrases should combined into a single space separated string. |
return_tag_sequences |
Logical indicating whether tag sequences should be returned along with phrases. Defaults to FALSE. |
A list object.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | ## Not run:
# make sure quanteda is installed
requireNamespace("quanteda", quietly = TRUE)
# load in U.S. presidential inaugural speeches from Quanteda example data.
documents <- quanteda::data_corpus_inaugural
# use first 10 documents for example
documents <- documents[1:10,]
# run tagger
tagged_documents <- POS_tag_documents(documents)
phrases <- extract_phrases(tagged_documents,
regex = "(A|N)*N(PD*(A|N)*N)*",
maximum_ngram_length = 8,
minimum_ngram_length = 1)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.