naiveBayesKernel: Classification Using A Bayes Classifier with Kernel Density...

Description Usage Arguments Details Value Author(s) Examples

Description

Kernel density estimates are fitted to the training data and a naive Bayes classifier is used to classify samples in the test data.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
  ## S4 method for signature 'matrix'
naiveBayesKernel(measurements, classes, test, ...)
  ## S4 method for signature 'DataFrame'
naiveBayesKernel(measurements, classes, test,
                 densityFunction = density,
                 densityParameters = list(bw = "nrd0", n = 1024,
                                                 from = expression(min(featureValues)),
                                              to = expression(max(featureValues))),
                   weighted = c("unweighted", "weighted", "both"),
                   weight = c("height difference", "crossover distance", "both"),
                   minDifference = 0, returnType = c("class", "score", "both"), verbose = 3)
  ## S4 method for signature 'MultiAssayExperiment'
naiveBayesKernel(measurements, test, targets = names(measurements), ...)  

Arguments

measurements

Either a matrix, DataFrame or MultiAssayExperiment containing the training data. For a matrix, the rows are features, and the columns are samples.

classes

Either a vector of class labels of class factor of the same length as the number of samples in measurements or if the measurements are of class DataFrame a character vector of length 1 containing the column name in measurement is also permitted. Not used if measurements is a MultiAssayExperiment object.

test

An object of the same class as measurements with no samples in common with measurements and the same number of features as it.

targets

If measurements is a MultiAssayExperiment, the names of the data tables to be used. "clinical" is also a valid value and specifies that integer variables from the clinical data table will be used.

...

Unused variables by the three top-level methods passed to the internal method which does the classification.

densityFunction

Default: density. A function which will return a probability density, which is essentially a list with x and y coordinates.

densityParameters

A list of options for densityFunction. Default: list(bw = "nrd0", n = 1024, from = expression(min(featureValues)), to = expression(max(featureValues)).

weighted

Default: "unweighted". Either "unweighted", "weighted" or "both". In weighted mode, the difference in densities is summed over all features. If unweighted mode, each feature's vote is worth the same. Both can be calculated simultaneously.

weight

Default: "both". Either "both", "height difference", or "crossover distance". The type of weight to calculate. For "height difference", the weight of each prediction is equal to the vertical distance between the highest density and the second-highest, for a particular value of x. For "crossover distance", the x positions where two densities cross is firstly calculated. The predicted class is the class with the highest density at the particular value of x and the weight is the distance of x from the nearest density crossover point.

minDifference

Default: 0. The minimum difference in density height between the highest density and second-highest for a feature to be allowed to vote. Can be a vector of cutoffs. If no features for a particular sample have a difference large enough, the class predicted is simply the largest class.

returnType

Default: "class". Either "class", "score" or "both". Sets the return value from the prediction to either a vector of predicted classes, a matrix of scores with columns corresponding to classes, as determined by the factor levels of classes, or both a column of predicted classes and columns of class scores in a data.frame.

verbose

Default: 3. A number between 0 and 3 for the amount of progress messages to give. This function only prints progress messages if the value is 3.

Details

If weighted is TRUE, then a sample's predicted class is the class with the largest sum of weights, each scaled for the number of samples in the training data of each class. Otherwise, when weighted is FALSE, each feature has an equal vote, and votes for the class with the largest weight, scaled for class sizes in the training set.

The variable name of each feature's measurements in the iteration over all features is featureValues. This is important to know if each feature's measurements need to be referred to in the specification of densityParameters, such as for specifying the range of x values of the density function to be computed. For example, see the default value of densityParameters above.

If weight is "crossover distance", the crossover points are computed by considering the distance between y values of all of the densities at every x value. x values for which a class density crosses any other class' density are used as the crossover points for that class.

Value

A vector or list of class prediction information (i.e. classes and/or scores), as long as the number of samples in the test data, or lists of such information, if both weighted and unweighted voting was used or a range of minDifference values was provided.

Author(s)

Dario Strbenac, John Ormerod

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
  trainMatrix <- matrix(rnorm(1000, 8, 2), ncol = 10)
  classes <- factor(rep(c("Poor", "Good"), each = 5))
  
  # Make first 30 genes increased in value for poor samples.
  trainMatrix[1:30, 1:5] <- trainMatrix[1:30, 1:5] + 5
  
  testMatrix <- matrix(rnorm(1000, 8, 2), ncol = 10)
  
  # Make first 30 genes increased in value for sixth to tenth samples.
  testMatrix[1:30, 6:10] <- testMatrix[1:30, 6:10] + 5
  
  naiveBayesKernel(trainMatrix, classes, testMatrix)

ClassifyR documentation built on Nov. 8, 2020, 6:53 p.m.