extract_ngram_filter: Extract phrase spans

Description Usage Arguments Value Examples

Description

Takes a sequences of POS tags and a regex and returns spans which match regex.

Usage

1
2
extract_ngram_filter(pos_tags, regex, maximum_ngram_length,
  minimum_ngram_length)

Arguments

pos_tags

A character vector of Penn TreeBank or Petrov/Gimpel style tags.

regex

The regular expression (or vector of regular expressions) used to find phrases.

maximum_ngram_length

The maximum length phrases returned.

minimum_ngram_length

The minimum length phrases returned.

Value

A numeric matrix with two columns and rows equal to number of spans matched. First column is span start, second is span end.

Examples

1
2
3
4
5
pos_tags <- c("VB", "JJ", "NN", "NN")
spans <- extract_ngram_filter(pos_tags,
                              regex = "(A|N)*N(PD*(A|N)*N)*",
                              maximum_ngram_length = 8,
                              minimum_ngram_length = 1)

phrasemachine documentation built on May 2, 2019, 8:23 a.m.