enron.sample: Enron sample

enron.sampleR Documentation

Enron sample

Description

A small sample of the Enron corpus comprising ten authors with approximately the same amount of data. Each author has one text labelled as 'unknown' and the other texts labelled as 'known'. The data was pre-processed using the POSnoise algorithm to mask content (see contentmask()).

Usage

enron.sample

Format

A quanteda corpus object.

Source

Halvani, Oren. 2021. Practice-Oriented Authorship Verification. Technical University of Darmstadt PhD Thesis. https://tuprints.ulb.tu-darmstadt.de/19861/


idiolect documentation built on Sept. 11, 2024, 5:34 p.m.