Quantification and evaluation of handwritten characters loops

Introduction


This tutorial presents toos for quantification and evaluation of loops contained in character, following work from @marquis2005 and @bozza2008.

The fourier quantification method and likelihood statistical evaluation are respectively implemented in ExtractFourier and TwoLevelLR methods.

suppressPackageStartupMessages(library(ForensicDocument))

Loop quantification

back to top


The loop quantification method, for contained in handwritten characters as described in @marquis2005, is implemented in the ExtractFourier method. In short, binary character images are skeletonized and quantified with a function $R(\theta)$, where $R(\theta)$ is the distance of the character skeleton to its barycenter at the angke $\theta$.

We give a more detailed description of this method:

  1. The algoritm used for the skeletonisation process is based on the one proposed by @stentiford1983. It has been modified for the particular case of closed loop by removing the end point condition, this tweek avoids to perform prunning on the resulting skeleton (see @stentiford1983 for further details).
  2. Skeletons (or loop) are parametrised by a discrete function $R(\theta)$, representing the length of a line joining a point of the contour to the barycenter. $\theta$ being the angle made by this line with the horizontal axis, with $0 \leq \theta < 2\times\pi$.
  3. Function $R(\theta)$ are resampled for n.samp $\theta$ values. That is, for the values $\theta=\frac{2\pi n}{n.samp+1}, \, n=0,\dots,n.samp-1$.
  4. Selected Fourier parameters are extranted from the signal $R[\theta]$ (using the fft method from stats).

Parameters

back to top

The parameters for this method are:

Example

back to top

In this example, we use the binarised handwritten character o (\emph{fig-O.png} file supplied in the package). The character image is not that is not sketonized skeletonize=TRUE.

We use a sampling size of 128 (n.samp=128) with 7 fourier harmonics (n.fourier=6+1) as in @marquis2005. We will get the ouput as a data.fame, therefore ouput=NULL. The verbose option is set to FALSE. In this case, the method ExtractFourier is used as follows:

files = system.file("extdata", "fig-O.png", package = "ForensicDocument")
result = ExtractFourier(files = files, 
                        n.fourier = 7, 
                        n.samp = 128, 
                        verbose = FALSE, 
                        output = NULL,
                        character_pixel = 0)
result

As stated above, the result object is a list of length length(files). In this particular case, there is only one input file, thus result is a list of length 1.

exp = c("typeof(result)", "length(result) == length(files)", "names(result) == files")
for(e in exp) cat(sprintf("%s : %s\n", e, eval(parse(text = e))))

Statistical Evaluation

back to top


In forensic science, the evidence $y$ is usually interpreted through the computation of a likelihood ratio: $$LR = \frac{f(y|H_p)}{f(y|H_d)} $$, Where

In the context of handwritten expertise suppose that: (i) an anonymous letter (i.e. the questioned document) is available for comparative analysis, and (ii) written material from a suspect is selected for comparative purposes (i.e. the reference document. For the compuation of the likelihood ratio, we consider the following propositions of interest:

In this tutorial and package, the proposed evaluation two-level likelihood ratio is based on the one developed by @bozza2008. It allows to take into account the within- and between-writer variability. The evidence $y$, namely the fourier parameters is supposed to follow a multivariate normal density with unknown mean vector and covariance matrix: $$ y \sim \mathcal{M}\mu,W) $$, $$ W^{-1} \sim \mathcal{IW}(U,n_w) $$, $$ \mu \sim \mathcal{M}(\theta,B) $$. Where $B$ and $U$ are the within and between writer covariance matrices.

The likelihood-ratio is comupted using the TwoLeveLR method, and the background parameters ($\mu$, $B$ and $U$) can be computed using the TwoLeveLR_Background

Parameters

back to top

The parameters for TwoLevelLR method are:

Similarly, parameters for TwoLevelLR_Background are:

Examples

back to top

In the following examples, we will use the characterO dataset. It contains the extracted Fourier (n.fourier = 4) parameters from 554 handwritten character loops, written by 11 writers. It is a subset of the data collected by @marquis2006. For more information on this dataset, see ?characterO.For other applications of this methodologie, see @marquis2011a and @taroni2012.

data(characterO)

In both examples, number of iterations and burn in iterations for the MCMC chain and set to 110 and 10. The inverse Wishart distribution degree of freedom nw=50, as in @bozza2008.

n.iter = 110
n.burnin = 10
nw = 50
Example 1: $H_p$ true

back to top

We present the case were the questioned and reference documents are written by the same author: writer 1 (writer 1 has a total of r sum(characterO$info$writer == 1) characters).

In this example we use the: for the reference document (data_reference), the parameters extracted from the first 23 characters of writer 1; for the questioned document (data_questioned), the parameters extracted from the last 23 characters of writer 1; * the background parameters (background), are computed using the TwoLevelLR_Background method with the remaining 10 writers.

# reference & questioned
data_reference = subset(characterO$measurements[,-1], 
                        subset = (characterO$info$writer == 1))[1:23,]
data_questioned = subset(characterO$measurements[,-1], 
                         subset = (characterO$info$writer == 1))[-(1:23),]
# background
subset = characterO$info$writer != 1
data_back = subset(characterO$measurements[,-1], 
                   subset = subset)
background = TwoLevelLR_Background(data_back, 
                                   fac = as.factor(characterO$info$writer[subset]))

The method TwoLevelLR is used as follows:

LLR = TwoLevelLR(data1 = data_reference,  
                data2 = data_questioned,
                background = background, 
                n.iter = n.iter, n.burnin = n.burnin,
                nw = nw)
LLR

The result object LLR is the numeric value of the log-likelihood ratio: $LLR = log(f(y|H_p))-log(f(y|H_d))$. Here the $LLR$ is positive (r LLR), suggesting that $H_p$ is true (i.e. the author of the reference document is the author of the questioned document).

Example 2: $H_d$ true

back to top

Here, we present the case were the questioned and reference documents are written by different authors: writer 1 and writer 2:

# reference & questioned
data_reference = subset(characterO$measurements[,-1], subset = characterO$info$writer == 1)[1:20,]
data_questioned = subset(characterO$measurements[,-1], subset = characterO$info$writer == 2)[1:20,]
subset = characterO$info$writer > 2
# background
data_back = subset(characterO$measurements[,-1], subset = subset)
background = TwoLevelLR_Background(data_back, fac = as.factor(characterO$info$writer[subset]))

The method TwoLevelLR is used as follows:

LLR = TwoLevelLR(data1 = data_reference,  
                data2 = data_questioned,
                background = background, 
                n.iter = n.iter, n.burnin = n.burnin,
                nw = nw)
LLR

Here the $LLR$ is negative (r LLR), suggesting that $H_d$ is true (i.e. the author of the reference document is not the author of the questioned document).


References

References cited in this tutorial



Try the ForensicDocument package in your browser

Any scripts or data that you put into this service are public.

ForensicDocument documentation built on May 2, 2019, 5 p.m.