Quantification and evaluation of handwritten characters loops
In ForensicDocument: Quantification and evaluation tools for forensic document examiners

Introduction

This tutorial presents toos for quantification and evaluation of loops contained in character, following work from @marquis2005 and @bozza2008.

The fourier quantification method and likelihood statistical evaluation are respectively implemented in ExtractFourier and TwoLevelLR methods.

Loop quantification
- Parameters
- Example
Statistical evaluation
References

suppressPackageStartupMessages(library(ForensicDocument))

Loop quantification

back to top

The loop quantification method, for contained in handwritten characters as described in @marquis2005, is implemented in the ExtractFourier method. In short, binary character images are skeletonized and quantified with a function $R(\theta)$, where $R(\theta)$ is the distance of the character skeleton to its barycenter at the angke $\theta$.

We give a more detailed description of this method:

The algoritm used for the skeletonisation process is based on the one proposed by @stentiford1983. It has been modified for the particular case of closed loop by removing the end point condition, this tweek avoids to perform prunning on the resulting skeleton (see @stentiford1983 for further details).
Skeletons (or loop) are parametrised by a discrete function $R(\theta)$, representing the length of a line joining a point of the contour to the barycenter. $\theta$ being the angle made by this line with the horizontal axis, with $0 \leq \theta < 2\times\pi$.
Function $R(\theta)$ are resampled for n.samp $\theta$ values. That is, for the values $\theta=\frac{2\pi n}{n.samp+1}, \, n=0,\dots,n.samp-1$.
Selected Fourier parameters are extranted from the signal $R[\theta]$ (using the fft method from stats).

Parameters

back to top

The parameters for this method are:

files, a vector of string specifying the images files to analyse. There is only one supported file format : the png file format, using the png package.
n.samp, a integer specifying the loops' resampling size. The default resampling size is n.samp=128.
n.fourier, a integer specifying the number of fourier harmonics to extract (n.fourier < n.samp).
skeletonize, a logical value (TRUE or FALSE) indicating if character images should be skeletonized. Default is TRUE.
character_pixel, a integer value (0 or 1), indicating indicating which pixel value is from the character, the other being the background.
output, a string specifying the name of the output file.
- If no output file is specified (when the argument output = NULL, or when it is not assigned), the method will return a list () of table containing the n.fourier first fourier parameters $a_n$ and $b_n$.
- If the output file is specified, results will be written in a csv type file with a tabulation delimiter ('\t'), and the method will a NULL value.
verbose a logical value, indicating if progress is to be printed on the console.

Example

back to top

In this example, we use the binarised handwritten character o (\emph{fig-O.png} file supplied in the package). The character image is not that is not sketonized skeletonize=TRUE.

We use a sampling size of 128 (n.samp=128) with 7 fourier harmonics (n.fourier=6+1) as in @marquis2005. We will get the ouput as a data.fame, therefore ouput=NULL. The verbose option is set to FALSE. In this case, the method ExtractFourier is used as follows:

files = system.file("extdata", "fig-O.png", package = "ForensicDocument")
result = ExtractFourier(files = files, 
                        n.fourier = 7, 
                        n.samp = 128, 
                        verbose = FALSE, 
                        output = NULL,
                        character_pixel = 0)
result

As stated above, the result object is a list of length length(files). In this particular case, there is only one input file, thus result is a list of length 1.

exp = c("typeof(result)", "length(result) == length(files)", "names(result) == files")
for(e in exp) cat(sprintf("%s : %s\n", e, eval(parse(text = e))))

Statistical Evaluation

back to top

In forensic science, the evidence $y$ is usually interpreted through the computation of a likelihood ratio: $$LR = \frac{f(y|H_p)}{f(y|H_d)} $$, Where

$H_p$: is the prosection hypthosesis;
$H_d$: is the defense hypothesis.

In the context of handwritten expertise suppose that: (i) an anonymous letter (i.e. the questioned document) is available for comparative analysis, and (ii) written material from a suspect is selected for comparative purposes (i.e. the reference document. For the compuation of the likelihood ratio, we consider the following propositions of interest:

$H_p$: the author of the reference document is the author of the questioned document;
$H_d$: the author of the reference document is not the author of the questioned document.

In this tutorial and package, the proposed evaluation two-level likelihood ratio is based on the one developed by @bozza2008. It allows to take into account the within- and between-writer variability. The evidence $y$, namely the fourier parameters is supposed to follow a multivariate normal density with unknown mean vector and covariance matrix: $$ y \sim \mathcal{M}\mu,W) $$, $$ W^{-1} \sim \mathcal{IW}(U,n_w) $$, $$ \mu \sim \mathcal{M}(\theta,B) $$. Where $B$ and $U$ are the within and between writer covariance matrices.

The likelihood-ratio is comupted using the TwoLeveLR method, and the background parameters ($\mu$, $B$ and $U$) can be computed using the TwoLeveLR_Background

Parameters

back to top

The parameters for TwoLevelLR method are:

data1, as data.frame object, the measurements from the 'reference' material.
data2, as data.frame object, the measurements from the 'reference' material.
background, a list containing the background parameters (overall and group means, within- and between group covariances matrice). This list can be ontained using the TwoLevelLR_Background method.
n.iter, a integer value giving the number of MCMC iterations. Default is 11000.
n.burnin, a integer value giving the number of burn-in iterations. Default is 1000.
nw, a integer value giving the degrees of freedom for the inverse Wishart distribution. Considering p variables in the data, nw must be $> 2\times p+4$.

Similarly, parameters for TwoLevelLR_Background are:

data, a $n \times p$ numeric matrix, with $p \geq q\times 2$. The background data containing $n$ measurements on $p$ variables.
fac, a factor of length $p$, indicating the 'population' of each measurement. In our case, the writer.

Examples

back to top

In the following examples, we will use the characterO dataset. It contains the extracted Fourier (n.fourier = 4) parameters from 554 handwritten character loops, written by 11 writers. It is a subset of the data collected by @marquis2006. For more information on this dataset, see ?characterO.For other applications of this methodologie, see @marquis2011a and @taroni2012.

data(characterO)

In both examples, number of iterations and burn in iterations for the MCMC chain and set to 110 and 10. The inverse Wishart distribution degree of freedom nw=50, as in @bozza2008.

n.iter = 110
n.burnin = 10
nw = 50

Example 1: $H_p$ true

back to top

We present the case were the questioned and reference documents are written by the same author: writer 1 (writer 1 has a total of r sum(characterO$info$writer == 1) characters).

In this example we use the: for the reference document (data_reference), the parameters extracted from the first 23 characters of writer 1; for the questioned document (data_questioned), the parameters extracted from the last 23 characters of writer 1; * the background parameters (background), are computed using the TwoLevelLR_Background method with the remaining 10 writers.

# reference & questioned
data_reference = subset(characterO$measurements[,-1], 
                        subset = (characterO$info$writer == 1))[1:23,]
data_questioned = subset(characterO$measurements[,-1], 
                         subset = (characterO$info$writer == 1))[-(1:23),]
# background
subset = characterO$info$writer != 1
data_back = subset(characterO$measurements[,-1], 
                   subset = subset)
background = TwoLevelLR_Background(data_back, 
                                   fac = as.factor(characterO$info$writer[subset]))

The method TwoLevelLR is used as follows:

LLR = TwoLevelLR(data1 = data_reference,  
                data2 = data_questioned,
                background = background, 
                n.iter = n.iter, n.burnin = n.burnin,
                nw = nw)
LLR

The result object LLR is the numeric value of the log-likelihood ratio: $LLR = log(f(y|H_p))-log(f(y|H_d))$. Here the $LLR$ is positive (r LLR), suggesting that $H_p$ is true (i.e. the author of the reference document is the author of the questioned document).

Example 2: $H_d$ true

back to top

Here, we present the case were the questioned and reference documents are written by different authors: writer 1 and writer 2:

for the reference document (data1), the parameters extracted from writer 1 (first 20 characters);
for the questioned document (data2), the parameters extracted from writer 2 (first 20 characters);
the background parameters (background), are computed using the TwoLevelLR_Background method with the remaining 9 writers

# reference & questioned
data_reference = subset(characterO$measurements[,-1], subset = characterO$info$writer == 1)[1:20,]
data_questioned = subset(characterO$measurements[,-1], subset = characterO$info$writer == 2)[1:20,]
subset = characterO$info$writer > 2
# background
data_back = subset(characterO$measurements[,-1], subset = subset)
background = TwoLevelLR_Background(data_back, fac = as.factor(characterO$info$writer[subset]))

The method TwoLevelLR is used as follows:

LLR = TwoLevelLR(data1 = data_reference,  
                data2 = data_questioned,
                background = background, 
                n.iter = n.iter, n.burnin = n.burnin,
                nw = nw)
LLR

Here the $LLR$ is negative (r LLR), suggesting that $H_d$ is true (i.e. the author of the reference document is not the author of the questioned document).

References

References cited in this tutorial

Any scripts or data that you put into this service are public.

ForensicDocument documentation built on May 2, 2019, 5 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

ForensicDocument
Quantification and evaluation tools for forensic document examiners

Quantification and evaluation of handwritten characters loops
In ForensicDocument: Quantification and evaluation tools for forensic document examiners

Introduction

Loop quantification

Parameters

Example

Statistical Evaluation

Parameters

Examples

Example 1: $H_p$ true

Example 2: $H_d$ true

References

Try the ForensicDocument package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

ForensicDocument Quantification and evaluation tools for forensic document examiners

Quantification and evaluation of handwritten characters loops In ForensicDocument: Quantification and evaluation tools for forensic document examiners

Introduction

Loop quantification

Parameters

Example

Statistical Evaluation

Parameters

Examples

Example 1: $H_p$ true

Example 2: $H_d$ true

References

Try the ForensicDocument package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

ForensicDocument
Quantification and evaluation tools for forensic document examiners

Quantification and evaluation of handwritten characters loops
In ForensicDocument: Quantification and evaluation tools for forensic document examiners