DTM: Document Term Matricizer
In myeomans/DTMtools: Adjustable Feature Counting for Text Analysis

Description Usage Arguments Value

View source: R/DTM.R

Turns text into data.

DTM(
  texts,
  sparse = 0.99,
  wstem = "all",
  ngrams = 1,
  language = "english",
  vocabmatch = NULL,
  stop.words = TRUE,
  punct = FALSE,
  POS = FALSE,
  dependency = FALSE,
  tag.sub = 0,
  overlap = 0.8,
  group.conc = NULL,
  group.conc.cutoff = 0.8,
  TPformat = FALSE,
  verbose = FALSE,
  mc.cores = 1
)

`texts`	a character vector of texts.
`sparse`	maximum feature sparsity for inclusion (1 = include all features)
`wstem`	character what words should be stemmed?
`ngrams`	numeric vector of ngram sizes (max = 1:3)
`language`	character what language are you parsing?
`vocabmatch`	matrix used to create a new matrix with features that are identical to a previous one
`stop.words`	logical should stop words be included? default is TRUE
`punct`	logical should exclamation points and question marks be included as features?
`POS`	logical should features have part of speech tags appended? default is FALSE
`dependency`	logical should features have dependency relations appended? default is FALSE
`tag.sub`	numeric what fraction of features should be replaced by POS tags? default is 0 (no features), fractions not supported yet.
`overlap`	numeric How dissimilar (in cossine distance) must an ngram be from all (n-1)grams to be added to feature set?
`group.conc`	character group IDs for removing group-specific words
`group.conc.cutoff`	numeric threshold for group-specificity of words, as proportion of occurences in the main group.
`TPformat`	logical - return in stm::textProcessor() format?
`verbose`	logical - report interim steps during processing