View source: R/nlp_phrase_sequences.R
as_phrasemachine | R Documentation |
Noun phrases are of common interest when doing natural language processing. Extracting noun phrases
from text can be done easily by defining a sequence of Parts of Speech tags. For example this sequence of POS tags
can be seen as a noun phrase: Adjective, Noun, Preposition, Noun.
This function recodes Universal POS tags to one of the following 1-letter tags, in order to simplify writing regular expressions
to find Parts of Speech sequences:
A: adjective
C: coordinating conjuction
D: determiner
M: modifier of verb
N: noun or proper noun
P: preposition
O: other elements
After which identifying a simple noun phrase can be just expressed by using the following regular expression (A|N)*N(P+D*(A|N)*N)* which basically says start with adjective or noun, another noun, a preposition, determiner adjective or noun and next a noun again.
as_phrasemachine(x, type = c("upos", "penn-treebank"))
x |
a character vector of POS tags for example by using |
type |
either 'upos' or 'penn-treebank' indicating to recode Universal Parts of Speech tags to the counterparts as described in the description, or to recode Parts of Speech tags as known in the Penn Treebank to the counterparts as described in the description |
For more information on extracting phrases see http://brenocon.com/handler2016phrases.pdf
the character vector x
where the respective POS tags are replaced with one-letter tags
phrases
x <- c("PROPN", "SCONJ", "ADJ", "NOUN", "VERB", "INTJ", "DET", "VERB", "PROPN", "AUX", "NUM", "NUM", "X", "SCONJ", "PRON", "PUNCT", "ADP", "X", "PUNCT", "AUX", "PROPN", "ADP", "X", "PROPN", "ADP", "DET", "CCONJ", "INTJ", "NOUN", "PROPN") as_phrasemachine(x)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.