find_duplicates: Find and label duplicate image signatures

View source: R/find_duplicates.R

find_duplicatesR Documentation

Find and label duplicate image signatures

Description

find_duplicates takes a data frame with two matchr_signature vectors and identifies and labels duplicate signatures, according to a given set of thresholds.

Usage

find_duplicates(x, threshold = 80, find_all = FALSE, quiet = FALSE)

Arguments

x

A data frame with two columns containing matchr_signature vectors. If there are columns named x_sig and y_sig, these will be the vectors which will be analyzed. If not, the first two columns containing matchr_signature vectors will be used. If more than two such columns are present, a warning will be issued.

threshold

A length-one integer vector. Which Hamming distance should be used to consider images to be identical? If the distance between two x or two y images is <= threshold, the images will be considered duplicates.

find_all

A logical scalar. Should the function find all y duplicates even for rows which do not have x duplicates (default FALSE)? If FALSE, rows will be checked for x duplicates first, and any row without an x duplicate will be removed from the search for y duplicates. This can result in considerable speed gains if the goal is to find rows which have both x and y duplicates (e.g. for subsequent processing in confirm_matches).

quiet

A logical scalar. Should the function execute quietly, or should it return status updates throughout the function (default)?

Value

The input data frame, with additional x_id and y_id fields which identify duplicated image signatures.


UPGo-McGill/matchr documentation built on July 19, 2023, 1:02 p.m.