This tutorial presents toos for quantification and evaluation of loops contained in character, following work from @marquis2005 and @bozza2008.
The fourier quantification method and likelihood statistical evaluation are respectively implemented in ExtractFourier
and TwoLevelLR
methods.
suppressPackageStartupMessages(library(ForensicDocument))
The loop quantification method, for contained in handwritten characters as described in @marquis2005, is implemented in the ExtractFourier
method. In short, binary character images are skeletonized and quantified with a function $R(\theta)$, where $R(\theta)$ is the distance of the character skeleton to its barycenter at the angke $\theta$.
We give a more detailed description of this method:
n.samp
$\theta$ values. That is, for the values $\theta=\frac{2\pi n}{n.samp+1}, \, n=0,\dots,n.samp-1$. fft
method from stats
).The parameters for this method are:
files
, a vector of string specifying the images files to analyse. There is only one supported file format : the png file format, using the png
package.n.samp
, a integer specifying the loops' resampling size. The default resampling size is n.samp=128
.n.fourier
, a integer specifying the number of fourier harmonics to extract (n.fourier < n.samp).skeletonize
, a logical value (TRUE
or FALSE
) indicating if character images should be skeletonized. Default is TRUE
.character_pixel
, a integer value (0
or 1
), indicating indicating which pixel value is from the character, the other being the background.output
, a string specifying the name of the output file.output = NULL
, or when it is not assigned), the method will return a list () of table containing the n.fourier
first fourier parameters $a_n$ and $b_n$.NULL
value. verbose
a logical value, indicating if progress is to be printed on the console.In this example, we use the binarised handwritten character o (\emph{fig-O.png} file supplied in the package). The character image is not that is not sketonized skeletonize=TRUE
.
We use a sampling size of 128 (n.samp=128
) with 7 fourier harmonics (n.fourier=6+1
) as in @marquis2005. We will get the ouput as a data.fame
, therefore ouput=NULL
. The verbose option is set to FALSE
.
In this case, the method ExtractFourier
is used as follows:
files = system.file("extdata", "fig-O.png", package = "ForensicDocument") result = ExtractFourier(files = files, n.fourier = 7, n.samp = 128, verbose = FALSE, output = NULL, character_pixel = 0) result
As stated above, the result
object is a list
of length length(files)
. In this particular case, there is only one input file, thus result
is a list of length 1.
exp = c("typeof(result)", "length(result) == length(files)", "names(result) == files") for(e in exp) cat(sprintf("%s : %s\n", e, eval(parse(text = e))))
In forensic science, the evidence $y$ is usually interpreted through the computation of a likelihood ratio: $$LR = \frac{f(y|H_p)}{f(y|H_d)} $$, Where
In the context of handwritten expertise suppose that: (i) an anonymous letter (i.e. the questioned document) is available for comparative analysis, and (ii) written material from a suspect is selected for comparative purposes (i.e. the reference document. For the compuation of the likelihood ratio, we consider the following propositions of interest:
In this tutorial and package, the proposed evaluation two-level likelihood ratio is based on the one developed by @bozza2008. It allows to take into account the within- and between-writer variability. The evidence $y$, namely the fourier parameters is supposed to follow a multivariate normal density with unknown mean vector and covariance matrix: $$ y \sim \mathcal{M}\mu,W) $$, $$ W^{-1} \sim \mathcal{IW}(U,n_w) $$, $$ \mu \sim \mathcal{M}(\theta,B) $$. Where $B$ and $U$ are the within and between writer covariance matrices.
The likelihood-ratio is comupted using the TwoLeveLR
method, and the background parameters ($\mu$, $B$ and $U$) can be computed using the TwoLeveLR_Background
The parameters for TwoLevelLR
method are:
data1
, as data.frame object, the measurements from the 'reference' material.data2
, as data.frame object, the measurements from the 'reference' material.background
, a list containing the background parameters (overall and group means, within- and between group covariances matrice). This list can be ontained using the TwoLevelLR_Background
method.n.iter
, a integer value giving the number of MCMC iterations. Default is 11000
.n.burnin
, a integer value giving the number of burn-in iterations. Default is 1000
.nw
, a integer value giving the degrees of freedom for the inverse Wishart distribution. Considering p variables in the data, nw must be $> 2\times p+4$.Similarly, parameters for TwoLevelLR_Background
are:
data
, a $n \times p$ numeric matrix, with $p \geq q\times 2$. The background data containing $n$ measurements on $p$ variables.fac
, a factor of length $p$, indicating the 'population' of each measurement. In our case, the writer.In the following examples, we will use the characterO
dataset. It contains the extracted Fourier (n.fourier = 4
) parameters from 554 handwritten character loops, written by 11 writers. It is a subset of the data collected by @marquis2006. For more information on this dataset, see ?characterO
.For other applications of this methodologie, see @marquis2011a and @taroni2012.
data(characterO)
In both examples, number of iterations and burn in iterations for the MCMC chain and set to 110 and 10. The inverse Wishart distribution degree of freedom nw=50
, as in @bozza2008.
n.iter = 110 n.burnin = 10 nw = 50
We present the case were the questioned and reference documents are written by the same author: writer 1 (writer 1 has a total of r sum(characterO$info$writer == 1)
characters).
In this example we use the:
for the reference document (data_reference
), the parameters extracted from the first 23 characters of writer 1;
for the questioned document (data_questioned
), the parameters extracted from the last 23 characters of writer 1;
* the background parameters (background
), are computed using the TwoLevelLR_Background
method with the remaining 10 writers.
# reference & questioned data_reference = subset(characterO$measurements[,-1], subset = (characterO$info$writer == 1))[1:23,] data_questioned = subset(characterO$measurements[,-1], subset = (characterO$info$writer == 1))[-(1:23),] # background subset = characterO$info$writer != 1 data_back = subset(characterO$measurements[,-1], subset = subset) background = TwoLevelLR_Background(data_back, fac = as.factor(characterO$info$writer[subset]))
The method TwoLevelLR
is used as follows:
LLR = TwoLevelLR(data1 = data_reference, data2 = data_questioned, background = background, n.iter = n.iter, n.burnin = n.burnin, nw = nw) LLR
The result object LLR
is the numeric value of the log-likelihood ratio: $LLR = log(f(y|H_p))-log(f(y|H_d))$. Here the $LLR$ is positive (r LLR
), suggesting that $H_p$ is true (i.e. the author of the reference document is the author of the questioned document).
Here, we present the case were the questioned and reference documents are written by different authors: writer 1 and writer 2:
data1
), the parameters extracted from writer 1 (first 20 characters);data2
), the parameters extracted from writer 2 (first 20 characters);background
), are computed using the TwoLevelLR_Background
method with the remaining 9 writers# reference & questioned data_reference = subset(characterO$measurements[,-1], subset = characterO$info$writer == 1)[1:20,] data_questioned = subset(characterO$measurements[,-1], subset = characterO$info$writer == 2)[1:20,] subset = characterO$info$writer > 2 # background data_back = subset(characterO$measurements[,-1], subset = subset) background = TwoLevelLR_Background(data_back, fac = as.factor(characterO$info$writer[subset]))
The method TwoLevelLR
is used as follows:
LLR = TwoLevelLR(data1 = data_reference, data2 = data_questioned, background = background, n.iter = n.iter, n.burnin = n.burnin, nw = nw) LLR
Here the $LLR$ is negative (r LLR
), suggesting that $H_d$ is true (i.e. the author of the reference document is not the author of the questioned document).
References cited in this tutorial
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.