Generation of Gaussian Mixture Data for Classification

Share:

Description

Generation of Gaussian mixture data for classification.

Usage

1
2
3
4
5
6
7
  mixtureData(n, prior, lambda, mu, sigma)

  mixtureLabels(data, prior, lambda, mu, sigma)

  mixturePosterior(data, prior, lambda, mu, sigma)

  mixtureBayesClass(data, prior, lambda, mu, sigma)

Arguments

n

Number of observations.

prior

Vector of class prior probabilities.

lambda

The conditional probabilities for the mixture components given the class. Either a vector (if the same number m of mixture components is desired for each class and the conditional probabilities for each class should be equal) or a list as long as the number of classes containing one vector of probabilities for every class. The length of the k-th element is the desired number of mixture components for the k-th class.

mu

The centers of the mixture components. A list containing one m_k x d matrix of centers for each class where d is the desired dimensionality of the data set.

sigma

The covariance matrices of the mixture components. Either one single matrix that is used for each mixture component or a list as long as the number of classes. List elements can be matrices (in case that for all mixture components forming one class the same covariance matrix shall be used) or lists of matrices as long as the number of mixture components in the corresponding class.

data

A data.frame.

Value

mixtureData returns an object of class "locClass", a list with components:

x

(A matrix.) The explanatory variables.

y

(A factor.) The class labels.

mixtureLabels returns a factor of class labels.

mixturePosterior returns a matrix of posterior probabilities.

mixtureBayesClass returns a factor of Bayes predictions.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
## Simplest case:
# lambda vector, sigma matrix
# Generate a training and a test set
mu <- list(matrix(c(-1,-1),1), matrix(rep(c(1,4.5,12),2),3))
train <- mixtureData(n = 1000, prior = rep(0.5,2), lambda = list(1, rep(1/3,3)), mu = mu, sigma = 3*diag(2))
test <- mixtureData(n = 1000, prior = rep(0.5,2), lambda = list(1, rep(1/3,3)), mu = mu, sigma = 3*diag(2))

# Generate a grid of points
x.1 <- x.2 <- seq(-7,15,0.1)
grid <- expand.grid(x.1 = x.1, x.2 = x.2)

# Calculate the posterior probablities for all grid points
gridPosterior <- mixturePosterior(grid, prior = rep(0.5,2), lambda = list(1, rep(1/3,3)), mu = mu, sigma = 3*diag(2))

# Draw contour lines of posterior probabilities and plot training observations
contour(x.1, x.2, matrix(gridPosterior[,1], length(x.1)), col = "gray")
points(train$x, col = train$y)

# Calculate Bayes error
ybayes <- mixtureBayesClass(test$x, prior = rep(0.5,2), lambda = list(1, rep(1/3,3)), mu = mu, sigma = 3*diag(2))
mean(ybayes != test$y)

if (require(MASS)) {

    # Fit an LDA model and calculate misclassification rate on the test data set
    tr <- lda(y ~ ., data = as.data.frame(train))
    pred <- predict(tr, as.data.frame(test))
    mean(pred$class != test$y)

    # Draw decision boundary
    gridPred <- predict(tr, grid)
    contour(x.1, x.2, matrix(gridPred$posterior[,1], length(x.1)), col = "red", levels = 0.5, add = TRUE)

}

## lambda list, sigma list of lists of matrices
mu <- list()
mu[[1]] <- matrix(c(1,5,1,5),2)
mu[[2]] <- matrix(c(8,11,8,11),2)
lambda <- list(c(0.5, 0.5), c(0.1, 0.9))
sigma <- list()
sigma[[1]] <- diag(2)
sigma[[2]] <- list(diag(2), 3*diag(2))
data <- mixtureData(n = 100, prior = c(0.3, 0.7), lambda, mu, sigma)
plot(data$x, col = data$y)