lambdaG: Apply the LambdaG algorithm
In idiolect: Forensic Authorship Analysis

lambdaG

R Documentation

Apply the LambdaG algorithm

Description

This function calculates the likelihood ratio of grammar models, or \lambda_G, as in Nini et al. (under review). In order to run the analysis as in this paper, all data must be preprocessed using contentmask() with the "algorithm" parameter set to "POSnoise".

Usage

lambdaG(q.data, k.data, ref.data, N = 10, r = 30, cores = NULL)

Arguments

`q.data`	The questioned or disputed data as a `quanteda` tokens object with the tokens being sentences (e.g. the output of `tokenize_sents()`).
`k.data`	The known or undisputed data as a `quanteda` tokens object with the tokens being sentences (e.g. the output of `tokenize_sents()`).
`ref.data`	The reference dataset as a `quanteda` tokens object with the tokens being sentences (e.g. the output of `tokenize_sents()`). This can be the same object as `k.data`.
`N`	The order of the model. Default is 10.
`r`	The number of iterations. Default is 30.
`cores`	The number of cores to use for parallel processing (the default is one).

Value

The function will test all possible combinations of Q texts and candidate authors and return a data frame containing \lambda_G, an uncalibrated log-likelihood ratio (base 10). \lambda_G can then be calibrated into a likelihood ratio that expresses the strength of the evidence using calibrate_LLR(). The data frame contains a column called "target" with a logical value which is TRUE if the author of the Q text is the candidate and FALSE otherwise.

References

Nini, A., Halvani, O., Graner, L., Gherardi, V., Ishihara, S. Authorship Verification based on the Likelihood Ratio of Grammar Models. https://arxiv.org/abs/2403.08462v1

Examples

q.data <- enron.sample[1] |> quanteda::tokens("sentence")
k.data <- enron.sample[2:10] |> quanteda::tokens("sentence")
ref.data <- enron.sample[11:ndoc(enron.sample)] |> quanteda::tokens("sentence")
lambdaG(q.data, k.data, ref.data)

idiolect documentation built on Sept. 11, 2024, 5:34 p.m.