trim-method: Trim an object.

trimR Documentation

Trim an object.

Description

Method to trim and adjust objects by applying thresholds, minimum frequencies etc. It can be applied to context, features, context, partition and partition_bundle objects.

Usage

trim(.Object, ...)

## S4 method for signature 'TermDocumentMatrix'
trim(
  .Object,
  terms_to_drop,
  docs_to_keep,
  min_count,
  min_doc_length,
  verbose = TRUE,
  ...
)

## S4 method for signature 'DocumentTermMatrix'
trim(
  .Object,
  terms_to_drop,
  docs_to_keep,
  min_count,
  min_doc_length,
  verbose = TRUE,
  ...
)

punctuation

Arguments

.Object

The object to be trimmed

...

further arguments

terms_to_drop

A character vector with terms to exclude from matrix (terms used as stopwords).

docs_to_keep

A character vector with documents to keep.

min_count

A numeric value with a minimum value of total term frequency across documents to exclude rare terms from matrix.

min_doc_length

A numeric value with minimum total of the summed-up occurrence of tokens in a document. Exclude documents below this value and filter out short documents. Note that the min_doc_length filter is applied before filtering for min_count and terms_to_keep, and that these filters will reduce document lengths.

verbose

A logical value, whether to output progress messages.

Format

An object of class character of length 13.

Author(s)

Andreas Blaette

Examples

use("RcppCWB", corpus = "REUTERS")
dtm <- corpus("REUTERS") %>%
  split(s_attribute = "id") %>%
  as.DocumentTermMatrix(p_attribute = "word", verbose = FALSE)
trim(dtm, min_doc_length = 100)

polmineR documentation built on Nov. 2, 2023, 5:52 p.m.