controls_txt: Controls for processing character data

View source: R/controls.R

controls_txtR Documentation

Controls for processing character data

Description

Controls for text data used in the blocking function (if representation = shingles), passed to tokenize_character_shingles.

Usage

controls_txt(
  n_shingles = 2L,
  n_chunks = 10L,
  lowercase = TRUE,
  strip_non_alphanum = TRUE
)

Arguments

n_shingles

length of shingles (default 2L),

n_chunks

passed to (default 10L),

lowercase

should the characters be made lower-case? (default TRUE),

strip_non_alphanum

should punctuation and white space be stripped? (default TRUE).

Value

Returns a list with parameters.

Author(s)

Maciej Beręsewicz


blocking documentation built on June 18, 2025, 9:16 a.m.