mixmodels: Classification based on Differential Distribution utilising...

Description Usage Arguments Details Value Author(s) Examples

Description

Fits mixtures of normals for every feature, separately for each class.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
  ## S4 method for signature 'matrix'
mixModelsTrain(measurements, ...)
  ## S4 method for signature 'DataFrame'
mixModelsTrain(measurements, classes, ..., verbose = 3)
  ## S4 method for signature 'MultiAssayExperiment'
mixModelsTrain(measurements, targets = names(measurements), ...)
  ## S4 method for signature 'MixModelsListsSet,matrix'
mixModelsPredict(models, test, ...)
  ## S4 method for signature 'MixModelsListsSet,DataFrame'
mixModelsPredict(models, test, weighted = c("unweighted", "weighted", "both"),
                  weight = c("height difference", "crossover distance", "both"),
              densityXvalues = 1024, minDifference = 0,
              returnType = c("class", "score", "both"), verbose = 3)
  ## S4 method for signature 'MixModelsListsSet,MultiAssayExperiment'
mixModelsPredict(models, test, targets = names(test), ...)  

Arguments

measurements

Either a matrix, DataFrame or MultiAssayExperiment containing the training data. For a matrix, the rows are features, and the columns are samples.

classes

Either a vector of class labels of class factor of the same length as the number of samples in measurements or if the measurements are of class DataFrame a character vector of length 1 containing the column name in measurement is also permitted. Not used if measurements is a MultiAssayExperiment object.

test

An object of the same class as measurements with no samples in common with measurements and the same number of features as it. Also, if a DataFrame, the class column must be absent.

targets

If measurements is a MultiAssayExperiment, the names of the data tables to be used. "clinical" is also a valid value and specifies that numeric variables from the clinical data table will be used.

...

Variables not used by the matrix nor the MultiAssayExperiment method which are passed into and used by the DataFrame method or extra arguments for training passed to mixmodCluster. The argument nbCluster is mandatory.

models

A MixModelsListsSet of models generated by the training function and training class information. There is one element for each class. Another element at the end of the list has the class sizes of the classes in the training data.

weighted

Default: "unweighted". Either "unweighted", "weighted" or "both". In weighted mode, the difference in densities is summed over all features. If unweighted mode, each feature's vote is worth the same. Both can be calculated simultaneously.

weight

Default: "both". Either "both", "height difference", or "crossover distance". The type of weight to calculate. For "height difference", the weight of each prediction is equal to the sum of the vertical distances for all of the mixture components within one class subtracted from the sum of the components of the other class, summed for each value of x. For "crossover distance", the x positions where the mixture density of the class being considered crosses another class' density is firstly calculated. The predicted class is the class with the highest mixture sum at the particular value of x and the weight is the distance of x from the nearest density crossover point.

densityXvalues

Default: 1024. Only relevant when weight is "crossover distance". The number of equally-spaced locations at which to calculate y values for each mixture density.

minDifference

Default: 0. The minimum difference in sums of mixture densities between the class with the highest sum and the class with the second highest sum for a feature to be allowed to vote. Can be a vector of cutoffs. If no features for a particular sample have a difference large enough, the class predicted is simply the largest class.

returnType

Default: "class". Either "class", "score" or "both". Sets the return value from the prediction to either a vector of predicted classes, a matrix of scores with columns corresponding to classes, as determined by the factor levels of classes, or both a column of predicted classes and columns of class scores in a data.frame.

verbose

Default: 3. A number between 0 and 3 for the amount of progress messages to give. This function only prints progress messages if the value is 3.

Details

If weighted is TRUE, then a sample's predicted class is the class with the largest sum of weights, each scaled for the number of samples in the training data of each class. Otherwise, when weighted is FALSE, each feature has an equal vote, and votes for the class with the largest weight, scaled for class sizes in the training set.

If weight is "crossover distance", the crossover points are computed by considering the distance between y values of the two densities at every x value. x values for which the sign of the difference changes compared to the difference of the closest lower value of x are used as the crossover points.

Value

For mixModelsTrain, a list of trained models of class MixmodCluster. For mixModelsPredict, a vector or list of class prediction information (i.e. classes and/or scores), as long as the number of samples in the test data, or lists of such information, if both weighted and unweighted voting was used or a range of minDifference values was provided.

Author(s)

Dario Strbenac

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
  # First 25 samples and first 5 genes are mixtures of two normals. Last 25 samples are
  # one normal.
  
  genesMatrix <- sapply(1:25, function(geneColumn) c(rnorm(5, sample(c(5, 15), replace = TRUE, 5))))
  genesMatrix <- cbind(genesMatrix, sapply(1:25, function(geneColumn) c(rnorm(5, 9, 1))))
  genesMatrix <- rbind(genesMatrix, sapply(1:50, function(geneColumn) rnorm(5, 9, 1)))
  rownames(genesMatrix) <- paste("Gene", 1:10)
  colnames(genesMatrix) <- paste("Sample", 1:50)
  classes <- factor(rep(c("Poor", "Good"), each = 25), levels = c("Good", "Poor"))
  
  trainSamples <- c(1:15, 26:40)
  testSamples <- c(16:25, 41:50)
  selected <- 1:5
  
  trained <- mixModelsTrain(genesMatrix[selected, trainSamples], classes[trainSamples],
                            nbCluster = 1:3)
  mixModelsPredict(trained, genesMatrix[selected, testSamples], minDifference = 0:3)

ClassifyR documentation built on Nov. 8, 2020, 6:53 p.m.