confirm_matches: Confirm image matches in a Shiny app

View source: R/confirm_matches.R

confirm_matchesR Documentation

Confirm image matches in a Shiny app

Description

confirm_matches takes the image matches produced by identify_matches and displays them in an interactive Shiny app for visual inspection and confirmation. Image matches with extremely low Hamming distances can be optionally excluded, and pairwise duplicates can be detected and excluded as well.

Usage

confirm_matches(
  result,
  remove_duplicates = TRUE,
  batch_size = 100L,
  thresholds = c(Identical = 80L, Match = 100L, `Likely match` = 120L, `Possible match`
    = 150L),
  previous = TRUE,
  quiet = FALSE
)

Arguments

result

A data frame produced by identify_matches, which has fields index (list of integer vectors), x_sig (matchr_signature vector), y_sig (matchr_signature vector) and distance (numeric vector).

remove_duplicates

A logical scalar. Should x-y pairs which are identical to other x-y pairs be reduced to a single x-y pair? This step can be computationally expensive for large datasets, but can dramatically reduce the number of matches to be verified.

batch_size

An integer scalar. The number of images to display at a time in the Shiny app (default 100).

thresholds

A named integer vector. Which Hamming distances establish thresholds for an "Identical" match (default 2L), a "Match" (default 4L), a "Likely match" (default 12L), a "Possible match" (default 15L), and "No match" (remaining values)? Image pairs with a distance equal to or less than the "Identical" threshold will be considered exact duplicates and will not be shown for verification in the comparison app. (Set "Identical" to -1L to force manual verification of all image pairs). Remaining image pairs will be grouped in the comparison app by these thresholds. Image pairs with distances equal to or under the "Likely match" value will be given a default value of "match" in the comparison app, while others will be given a default value of "no match". If remove_duplicates is TRUE, the "Identical" threshold will be used to identify duplicated images. (I.e. if the distance between two x or two y images is <= the "Identical" threshold value, the images will be considered duplicates.) If thresholds elements are not named, their names will be inferred by ordering the values from smallest to largest. If thresholds elements are not integers, they will be silently converted to integers by truncating all digits to the right of the decimal point.

previous

A logical scalar. Should the results of previous runs of compare_images be incorporated into the new results (default TRUE), or should previously compared matches be compared again? If this argument is TRUE, then any rows in result with a confirmed value of TRUE will be removed from the data frame before processing (and so will not be present in the comparison interface) and then re-added unchanged to the output.

quiet

A logical scalar. Should the function execute quietly, or should it return status updates throughout the function (default)?

Details

The interface presents pairs of images alongside a best guess as to the match status ("Match" or "No match"). For matches which are correctly identified, no further action is necessary, while incorrect identifications can be corrected by clicking "Match" or "No match" next to the image pair. Images are presented in batches, and at any point the user can click the "Save and exit" button to close the comparison app and retrieve the results up through the last batch which was viewed. This means that even extremely large sets of potential matches can be manually verified over the course of several sessions.

Through the "Enable highlighting" button, specific matches can be highlighted for further follow-up after image comparison is finished.

The Shiny app will only launch in an interactive R session; if confirm_matches is called in a non-interactive context, it will identify identical matches according to the thresholds argument and return only those results.

Value

A data frame with the following fields: index from the original result data frame; a logical vector new_match_status, which is TRUE for confirmed matches, FALSE for confirmed non-matches, and NA for matches which were not confirmed; and a logical vector new_highlight which is TRUE for any matches which were highlighted using the in-app interface, FALSE for matches which were not highlighted, and NA for matches which were not confirmed. Confirmation is determined by how many pages into the Shiny app the user proceeded, and thus how many pairings were viewed. If all pages are viewed, then the output will have no NA values.

Examples

## Not run: 
# Setup
sigs <- create_signature(test_urls)
matches <- match_signatures(sigs)
result <- identify_matches(matches)

# Assign the output of compare_images to retrieve results
change_table <- confirm_matches(result)

## End(Not run)

UPGo-McGill/matchr documentation built on July 19, 2023, 1:02 p.m.