# Updated Classification Method using Labeled and Unlabeled Data

### Description

This function implements the EM algorithm by iterating over the E-step and M-step. The initial values are obtained from the labeled data then both steps are further iterated over the complete data, labeled and unlabeled data combined.

### Usage

1 2 3 | ```
upclassifymodel(Xtrain, cltrain, Xtest, cltest = NULL,
modelName = "EEE", tol = 10^-5, iterlim = 1000,
Aitken = TRUE, ...)
``` |

### Arguments

`Xtrain` |
A numeric matrix of observations where rows correspond to observations and columns correspond to variables. The group membership of each observation is known - labeled data. |

`cltrain` |
A numeric vector with distinct entries representing a classification of the corresponding observations in |

`Xtest` |
A numeric matrix of observations where rows correspond to observations and columns correspond to variables. The group membership of each observation may not be known - unlabeled data. |

`cltest` |
A numeric vector with distinct entries representing a classification of the corresponding observations in |

`modelName` |
A character string indicating the model, with default "EEE".
The models available for selection are described in |

`tol` |
A positive number, with default |

`iterlim` |
A positive integer, with default 1000, which is the desired limit on the maximum number of iterations. |

`Aitken` |
A logical value with default |

`...` |
Arguments passed to or from other methods. |

### Details

This is an updated approach to typical classification methods. Initially, the M-step is performed on the labeled (training) data to obtain parameter estimates for the model. These are used in an E-step to obtain group memberships for the unlabeled (test) data. The training data labels and new probability estimates for test data labels are combined to form the complete data. From here, the M-step and E-step are iterated over the complete data, with continuous updating until convergence has been reached. This has been shown to result in lower misclassification rates, particularly in cases where only a small proportion of the total data is labeled.

### Value

The return value is a list with the following components:

`call` |
The function call from |

`Ntrain` |
The number of observations in the training data. |

`Ntest` |
The number of observations in the test data. |

`d` |
The dimension of the data. |

`G` |
The number of groups in the data |

`iter` |
The number of iterations required to reach convergence. If convergence was not obtained, this is equal to |

`converged` |
A logical value where |

`modelName` |
A character string identifying the model (same as the input argument). |

`parameters pro` |
A vector whose |

`mean` |
The mean for each component. If there is more than one component, this is a matrix whose |

`variance` |
A list of variance parameters for the model. The components of this list depend on the model specification. |

`train/test z` |
A matrix whose |

`cl` |
A numeric vector with distinct entries representing a classification of the corresponding observations in |

`rate` |
The number of misclassified observations. |

`Brierscore` |
The Brier score measuring the accuracy of the probabilities ( |

`tab` |
A table of actual and predicted group classifications. |

`ll` |
The log-likelihood for the data in the mixture model. |

`bic` |
The Bayesian Information Criterion for the model. |

### Author(s)

Niamh Russell

### References

C. Fraley and A.E. Raftery (2002). Model based clustering, discriminant analysis, and density estimation. *Journal of the American Statistical Association* 97:611-631.

Fraley, C. and Raftery, A.E. (2006). MCLUST Version for R: Normal Mixture Modeling and Model-Based Clustering, Technical Report no. 504, Department of Statistics, University of Washington.

Dean, N., Murphy, T.B. and Downey, G (2006). Using unlabelled data to update classification rules with applications in food authenticity studies. *Journal of the royal Statistical Society: Series C* 55 (1), 1-14.

### See Also

`upclassify`

, `Aitken`

, `modelvec`

### Examples

1 2 3 4 5 6 7 8 9 | ```
# This function is not designed to be used on its own,
# but to be called by \code{upclassify}
data(wine, package = "gclus")
X <- as.matrix(wine[, -1])
cl <- unclass(wine[, 1])
indtrain <- sort(sample(1:178, 120))
indtest <- setdiff(1:178, indtrain)
fitup <- upclassifymodel(X[indtrain,], cl[indtrain], X[indtest,], cl[indtest])
``` |

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker. Vote for new features on Trello.