R/enron.sample.R

#' Enron sample
#'
#' A small sample of the *Enron* corpus comprising ten authors with approximately the same amount of data. Each author has one text labelled as 'unknown' and the other texts labelled as 'known'. The data was pre-processed using the *POSnoise* algorithm to mask content (see [contentmask()]).
#'
#' @format
#' A `quanteda` corpus object.
#'
#' @source
#' Halvani, Oren. 2021. Practice-Oriented Authorship Verification. Technical University of Darmstadt PhD Thesis.
#' https://tuprints.ulb.tu-darmstadt.de/19861/
"enron.sample"

Try the idiolect package in your browser

Any scripts or data that you put into this service are public.

idiolect documentation built on Sept. 11, 2024, 5:34 p.m.