stri_duplicated: Determine Duplicated Elements
In stringi: Fast and Portable Character String Processing Facilities

stri_duplicated

R Documentation

Determine Duplicated Elements

Description

stri_duplicated() determines which strings in a character vector are duplicates of other elements.

stri_duplicated_any() determines if there are any duplicated strings in a character vector.

Usage

stri_duplicated(
  str,
  from_last = FALSE,
  fromLast = from_last,
  ...,
  opts_collator = NULL
)

stri_duplicated_any(
  str,
  from_last = FALSE,
  fromLast = from_last,
  ...,
  opts_collator = NULL
)

Arguments

`str`	a character vector
`from_last`	a single logical value; indicates whether search should be performed from the last to the first string
`fromLast`	[DEPRECATED] alias of `from_last`
`...`	additional settings for `opts_collator`
`opts_collator`	a named list with ICU Collator's options, see `stri_opts_collator`, `NULL` for default collation options

Details

Missing values are regarded as equal.

Unlike duplicated and anyDuplicated, these functions test for canonical equivalence of strings (and not whether the strings are just bytewise equal) Such operations are locale-dependent. Hence, stri_duplicated and stri_duplicated_any are significantly slower (but much better suited for natural language processing) than their base R counterparts.

See also stri_unique for extracting unique elements.

Value

stri_duplicated() returns a logical vector of the same length as str. Each of its elements indicates whether a canonically equivalent string was already found in str.

stri_duplicated_any() returns a single non-negative integer. Value of 0 indicates that all the elements in str are unique. Otherwise, it gives the index of the first non-unique element.

Author(s)

Marek Gagolewski and other contributors

References

Collation - ICU User Guide, https://unicode-org.github.io/icu/userguide/collation/

Examples

# In the following examples, we have 3 duplicated values,
# 'a' - 2 times, NA - 1 time
stri_duplicated(c('a', 'b', 'a', NA, 'a', NA))
stri_duplicated(c('a', 'b', 'a', NA, 'a', NA), from_last=TRUE)
stri_duplicated_any(c('a', 'b', 'a', NA, 'a', NA))

# compare the results:
stri_duplicated(c('\u0105', stri_trans_nfkd('\u0105')))
duplicated(c('\u0105', stri_trans_nfkd('\u0105')))

stri_duplicated(c('gro\u00df', 'GROSS', 'Gro\u00df', 'Gross'), strength=1)
duplicated(c('gro\u00df', 'GROSS', 'Gro\u00df', 'Gross'))

stringi documentation built on May 29, 2024, 8:16 a.m.

stringi index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

stringi
Fast and Portable Character String Processing Facilities

stri_duplicated: Determine Duplicated Elements
In stringi: Fast and Portable Character String Processing Facilities

Determine Duplicated Elements

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to stri_duplicated in stringi...

R Package Documentation

Browse R Packages

We want your feedback!

stringi Fast and Portable Character String Processing Facilities

stri_duplicated: Determine Duplicated Elements In stringi: Fast and Portable Character String Processing Facilities

Determine Duplicated Elements

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to stri_duplicated in stringi...

R Package Documentation

Browse R Packages

We want your feedback!

stringi
Fast and Portable Character String Processing Facilities

stri_duplicated: Determine Duplicated Elements
In stringi: Fast and Portable Character String Processing Facilities