# mixtureData: Generation of Gaussian Mixture Data for Classification In locClassData: Collection of Artificial Classification Problems

## Description

Generation of Gaussian mixture data for classification.

## Usage

 ```1 2 3 4 5 6 7``` ``` mixtureData(n, prior, lambda, mu, sigma) mixtureLabels(data, prior, lambda, mu, sigma) mixturePosterior(data, prior, lambda, mu, sigma) mixtureBayesClass(data, prior, lambda, mu, sigma) ```

## Arguments

 `n` Number of observations. `prior` Vector of class prior probabilities. `lambda` The conditional probabilities for the mixture components given the class. Either a vector (if the same number m of mixture components is desired for each class and the conditional probabilities for each class should be equal) or a list as long as the number of classes containing one vector of probabilities for every class. The length of the k-th element is the desired number of mixture components for the k-th class. `mu` The centers of the mixture components. A list containing one m_k x d matrix of centers for each class where d is the desired dimensionality of the data set. `sigma` The covariance matrices of the mixture components. Either one single matrix that is used for each mixture component or a list as long as the number of classes. List elements can be matrices (in case that for all mixture components forming one class the same covariance matrix shall be used) or lists of matrices as long as the number of mixture components in the corresponding class. `data` A `data.frame`.

## Value

`mixtureData` returns an object of class `"locClass"`, a list with components:

 `x` (A matrix.) The explanatory variables. `y` (A factor.) The class labels.

`mixtureLabels` returns a factor of class labels.

`mixturePosterior` returns a matrix of posterior probabilities.

`mixtureBayesClass` returns a factor of Bayes predictions.

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45``` ```## Simplest case: # lambda vector, sigma matrix # Generate a training and a test set mu <- list(matrix(c(-1,-1),1), matrix(rep(c(1,4.5,12),2),3)) train <- mixtureData(n = 1000, prior = rep(0.5,2), lambda = list(1, rep(1/3,3)), mu = mu, sigma = 3*diag(2)) test <- mixtureData(n = 1000, prior = rep(0.5,2), lambda = list(1, rep(1/3,3)), mu = mu, sigma = 3*diag(2)) # Generate a grid of points x.1 <- x.2 <- seq(-7,15,0.1) grid <- expand.grid(x.1 = x.1, x.2 = x.2) # Calculate the posterior probablities for all grid points gridPosterior <- mixturePosterior(grid, prior = rep(0.5,2), lambda = list(1, rep(1/3,3)), mu = mu, sigma = 3*diag(2)) # Draw contour lines of posterior probabilities and plot training observations contour(x.1, x.2, matrix(gridPosterior[,1], length(x.1)), col = "gray") points(train\$x, col = train\$y) # Calculate Bayes error ybayes <- mixtureBayesClass(test\$x, prior = rep(0.5,2), lambda = list(1, rep(1/3,3)), mu = mu, sigma = 3*diag(2)) mean(ybayes != test\$y) if (require(MASS)) { # Fit an LDA model and calculate misclassification rate on the test data set tr <- lda(y ~ ., data = as.data.frame(train)) pred <- predict(tr, as.data.frame(test)) mean(pred\$class != test\$y) # Draw decision boundary gridPred <- predict(tr, grid) contour(x.1, x.2, matrix(gridPred\$posterior[,1], length(x.1)), col = "red", levels = 0.5, add = TRUE) } ## lambda list, sigma list of lists of matrices mu <- list() mu[[1]] <- matrix(c(1,5,1,5),2) mu[[2]] <- matrix(c(8,11,8,11),2) lambda <- list(c(0.5, 0.5), c(0.1, 0.9)) sigma <- list() sigma[[1]] <- diag(2) sigma[[2]] <- list(diag(2), 3*diag(2)) data <- mixtureData(n = 100, prior = c(0.3, 0.7), lambda, mu, sigma) plot(data\$x, col = data\$y) ```

locClassData documentation built on May 31, 2017, 3:34 a.m.