posTokens: POS Tagger

Description Usage Arguments Value

View source: R/posTokens.R

Description

Tally parse-dependent features

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
posTokens(
  texts,
  ngrams = 1,
  language = "english",
  punct = FALSE,
  stop.words = TRUE,
  overlap = 1,
  sparse = 0.99,
  dependency = FALSE,
  tag.sub = 0,
  verbose = FALSE
)

Arguments

texts

a character vector of texts.

ngrams

numeric vector of ngram sizes (max = 1:3)

language

character what language are you parsing?

punct

logical should exclamation points and question marks be included as features?

stop.words

logical should stop words be included? default is TRUE

overlap

numeric How dissimilar (in cossine distance) must an ngram be from all (n-1)grams to be added to feature set?

sparse

maximum feature sparsity for inclusion (1 = include all features)

dependency

logical should features have dependency relations appended? default is FALSE

tag.sub

numeric what fraction of features should be replaced by POS tags? default is 0 (no features), fractions not supported yet.

verbose

logical - report interim steps during processing

POS

logical should features have part of speech tags appended? default is FALSE

Value

a matrix of feature counts


myeomans/DTMtools documentation built on March 2, 2020, 8:57 p.m.