outlierCorrectData: Generation of a Binary Classification Problem with Outliers

Description Usage Arguments Value Examples

View source: R/outlierCorrectData.R

Description

Generation of a binary classification problem with outliers on the correct side of the decision boundary.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
  outlierCorrectData(n, alpha = 5, beta = 5, prop = 0.05,
    prior = rep(0.5, 2))

  outlierCorrectLabels(data, alpha = 5, beta = 5,
    prop = 0.05, prior = rep(0.5, 2))

  outlierCorrectPosterior(data, alpha = 5, beta = 5,
    prop = 0.05, prior = rep(0.5, 2))

  outlierCorrectBayesClass(data, alpha = 5, beta = 5,
    prop = 0.05, prior = rep(0.5, 2))

Arguments

n

Number of observations.

alpha

Distance from class center to the outliers in the x-coordinate.

beta

Distance from class center to the outliers in the y-coordinate.

prop

Proportion of outliers. Defaults to 0.05.

prior

Vector of class prior probabilities. Defaults to equal class priors.

data

A data.frame.

Value

outlierCorrectData returns an object of class "locClass", a list with components:

x

(A matrix.) The explanatory variables.

y

(A factor.) The class labels.

outlierCorrectLabels returns a factor of class labels.

outlierCorrectPosterior returns a matrix of posterior probabilities.

outlierCorrectBayesClass returns a factor of Bayes predictions.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# Generate a training and a test set
train <- outlierCorrectData(n = 1000)
test <- outlierCorrectData(n = 1000)

# Generate a grid of points
x.1 <- x.2 <- seq(-7,15,0.1)
grid <- expand.grid(x.1 = x.1, x.2 = x.2)

# Calculate the posterior probablities for all grid points
gridPosterior <- outlierCorrectPosterior(grid)

# Draw contour lines of posterior probabilities and plot training observations
plot(train$x, col = train$y)
contour(x.1, x.2, matrix(gridPosterior[,1], length(x.1)), col = "gray", add = TRUE)

# Calculate Bayes error
ybayes <- outlierCorrectBayesClass(test$x)
mean(ybayes != test$y)

if (require(MASS)) {

    # Fit an LDA model and calculate misclassification rate on the test data set
    tr <- lda(y ~ ., data = as.data.frame(train))
    pred <- predict(tr, as.data.frame(test))
    print(mean(pred$class != test$y))

    # Draw decision boundary
    gridPred <- predict(tr, grid)
    contour(x.1, x.2, matrix(gridPred$posterior[,1], length(x.1)), col = "red", levels = 0.5, add = TRUE)

}

locClassData documentation built on May 2, 2019, 5:26 p.m.