clean_textsdc: Extract deduplicated version from duplication objects

Description Usage Arguments Value

View source: R/duplication.R

Description

This method extracts the deduplicated text vector of the input duplication object. get_deduplicated_version is provided for backward compatibility.

Usage

1
2
3
get_deduplicated_version(...)

clean_textsdc(duplication, precedence = "earlier")

Arguments

...

parameters to be passed to clean_textsdc

duplication

the duplication object to be processed

precedence

character of one of the following options: earlier (default), longer, shorter, random. This option controls which document to take when duplicates exist. This is not used, when input_text is a dfm object.

  • earlier: Take the document which is earlier in the input text vector.

  • longer: Take the document which is longer.

  • shorter: Take the document which is shorter.

  • random: Randomly take a document.

Value

a text vector or a dfm object


chainsawriot/textsdc documentation built on Dec. 31, 2021, 9:54 a.m.