trainingSetGenerator: Training Set Generator

Description Usage Arguments Value Examples

View source: R/trainingSetGenerator.R

Description

This function generates a training set with n legal/spam emails, according to the probability distributions in mailGenerator.

Usage

1
2
3
4
trainingSetGenerator(n = 1000, spamPrev = 0.1, k = 5,
  Names = c("viagra", "rajoy", "icmat", "hi", "bye"),
  subsetSpam = matrix(c(1, 0, 0, 0, 0, 1, 1, 0, 0, 0), nrow = 2, byrow = T),
  probSubsetSpam = c(0.8, 0.2))

Arguments

n

number of emails to generate, defaults is 1000.

spamPrev

spam prevalence, defaults is 0.1.

k

number of words in each email. Defaults is 5.

Names

vector with the names of the words in the email.

subsetSpam

matrix where each row is a configurations of words that have non-zero probability given that the email is spam. Defaults is matrix(c(1,0,0,0,0,1,1,0,0,0),nrow = 2, byrow = T).

probSubsetSpam

probability of the configurations in subsetSpam given the email is spam. The probabilities for the rest of configurations are set to 0. Defaults is c(0.8,0.2).

Value

This function returns a dataframe containing the generated emails.

Examples

1

roinaveiro/acAra documentation built on May 27, 2019, 1:47 p.m.