Description Usage Arguments Value Examples
Extracts phrases from a set of documents using the "FilterFSA" method in Handler et al. 2016.
1 2 3 4 | phrasemachine(documents, regex = "(A|N)*N(PD*(A|N)*N)*",
maximum_ngram_length = 8, minimum_ngram_length = 2,
return_phrase_vectors = TRUE, return_tag_sequences = FALSE,
memory = "-Xmx512M")
|
documents |
A vector of strings (one per document). |
regex |
The regular expression used to find phrases. Defaults to "(A|N)*N(PD*(A|N)*N)*", the "SimpleNP" grammar in Handler et al. 2016. A vector of regular expressions may also be provided if the user wishes to match more than one. |
maximum_ngram_length |
The maximum length phrases returned. Defaults to 8. Increasing this number can greatly increase runtime. |
minimum_ngram_length |
The minimum length phrases returned. Defaults to 2. Can be increased to remove shorter phrases, or decreased to include unigrams. |
return_phrase_vectors |
Logical indicating whether a list of phrase vectors (with each entry contain a vector of phrases in one document) should be returned, or whether phrases should combined into a single space separated string. |
return_tag_sequences |
Logical indicating whether tag sequences should be returned along with phrases. Defaults to FALSE. |
memory |
The default amount of memory (512MB) assigned to the NLP package to POS tag documents is often not enough for large documents, which can lead to a "java.lang.OutOfMemoryError". The memory argument defaults to "-Xmx512M" (512MB) in this package, and can be increased if necessary to accommodate very large documents. |
A list object.
1 | phrasemachine("Hello there my red good cat.")
|
phrasemachine: Simple Phrase Extraction
Version 1.1.2 created on 2017-05-29.
copyright (c) 2016, Matthew J. Denny, Abram Handler, Brendan O'Connor.
Type help('phrasemachine') or
vignette('getting_started_with_phrasemachine') to get started.
Development website: https://github.com/slanglab/phrasemachine
Currently tagging document 1 of 1
OpenJDK 64-Bit Server VM warning: Can't detect initial thread stack location - find_vma failed
Extracting phrases from document 1 of 1
[[1]]
[1] "red_good_cat" "good_cat"
Warning message:
system call failed: Cannot allocate memory
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.