library(handwriter) library(handwriterRF) library(handwriterApp) library(magick) # knitr::opts_chunk$set(fig.pos = "H", out.extra = "")
files <- list.files(file.path(tempdir(), "comparison1", "docs"), full.names = TRUE) knitr::include_graphics(files[1])
knitr::include_graphics(files[2])
doc_paths <- list.files(file.path(params$project_dir, "docs")) knitr::kable( data.frame("File" = basename(doc_paths)), booktabs = TRUE, caption = "Handwriting samples" )
Handwriter addresses two hypotheses:
Handwriter assumes that the documents were written in the writer's natural handwriting and that the writer did not attempt to disguise their handwriting nor forge someone else's handwriting.
Handwriter has been tested on handwriting examples from publicly available handwriting databases, where volunteers were asked to copy a writing prompt in their natural handwriting. Error rates on other types of handwriting samples are unknown.
Handwriter processes handwriting by converting the writing to black and white, thinning the writing to 1 pixel in width, and following a set of rules to break the writing into component shapes called graphs. Graphs capture shapes, not necessarily individual letters. Graphs might be a part of a letter or contain parts of multiple letters.
handwriter::plotNodes(params$graphs1)
handwriter::plotNodes(params$graphs2)
Handwriter use 40 exemplar shapes called clusters. Again, these shapes are not necessarily individual letters. They might be part of a letter or contain parts of multiple letters. For more information on how these 40 clusters were created, see https://onlinelibrary.wiley.com/doi/abs/10.1002/sam.11488.
knitr::include_graphics(system.file("extdata", "images", "template.png", package = "handwriterApp"))
For each handwriting sample, handwriter assigns each graph to the cluster with the most similar shape. Then for each document, handwriter calculates the proportion of graphs assigned to each cluster. The rate at which a writer produces graphs in each cluster serves as an estimate of a writer profile.
df <- rbind(params$clusters1, params$clusters2) counts <- handwriter::get_cluster_fill_counts(df) rates <- handwriterRF::get_cluster_fill_rates(counts) plot_writer_profiles(rates)
Handwriter measures the similarity between the two writer profiles using a random forest trained on handwriting samples from the CSAFE Handwriting Database (https://data.csafe.iastate.edu/HandwritingDatabase/). The result is a similarity score between the two writer profiles. Next, handwriter calculates the likelihood of observing the similarity score if the same writer hypothesis is true and the likelihood of observing the similarity score if the different writers hypothesis is true. The score-based likelihood ratio is the ratio of these two likelihoods. For more information, see https://doi.org/10.1002/sam.11566.
The similarity score is r params$slr_df$score
.
The score-based likelihood ratio is r params$slr_df$slr
.
r handwriterRF::interpret_slr(params$slr_df)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.