trainingSetGenerator: Training Set Generator
In roinaveiro/acAra: Adversarial Classification with ARA

Description Usage Arguments Value Examples

This function generates a training set with n legal/spam emails, according to the probability distributions in mailGenerator.

trainingSetGenerator(n = 1000, spamPrev = 0.1, k = 5,
  Names = c("viagra", "rajoy", "icmat", "hi", "bye"),
  subsetSpam = matrix(c(1, 0, 0, 0, 0, 1, 1, 0, 0, 0), nrow = 2, byrow = T),
  probSubsetSpam = c(0.8, 0.2))

`n`	number of emails to generate, defaults is 1000.
`spamPrev`	spam prevalence, defaults is 0.1.
`k`	number of words in each email. Defaults is 5.
`Names`	vector with the names of the words in the email.
`subsetSpam`	matrix where each row is a configurations of words that have non-zero probability given that the email is spam. Defaults is `matrix(c(1,0,0,0,0,1,1,0,0,0),nrow = 2, byrow = T)`.
`probSubsetSpam`	probability of the configurations in subsetSpam given the email is spam. The probabilities for the rest of configurations are set to 0. Defaults is `c(0.8,0.2)`.