LPStesting: LPS testing

View source: R/LPStesting.R

LPStestingR Documentation

LPS testing

Description

This is the function to calculate LPS (Linear Predictor Score) scores for a testing data set with a given LPS training object. The selected features with their weights, and two classes' LPS score distribution parametes are extracted from the given LPS training object.

Usage

LPStesting(
  LPStrainObj,
  newdat,
  standardization = FALSE,
  classProbCut = 0.9,
  imputeNA = FALSE,
  byrow = TRUE,
  imputeValue = c("median", "mean")
)

Arguments

LPStrainObj

a LPS training object, which is the output from LPStraining function

newdat

a new data matrix or data frame, which is comparable to the training data set, its columns are for samples and rows are for features

standardization

a logic variable to indicate if standardization is needed before classification score calculation

classProbCut

a numeric variable within (0,1), which is a cutoff of Empirical Bayesian probability, often used values are 0.8 and 0.9, default value is 0.9. The same classProbCut is used for both groups, the samples that are not included in either group will be assigned as UNCLASS

imputeNA

a logic variable to indicate if NA imputation is needed, if it is TRUE, NA imputation is processed before any other steps, the default is FALSE

byrow

a logic variable to indicate direction for imputation, default is TRUE, which will use the row data for imputation

imputeValue

a character variable to indicate which value to be used to replace NA, default is "median", the median value of the chosen direction with "byrow" data to be used

Details

This is the function to calculate LPS scores and make classification based on Empirical Bayesian probabilities for a new testing data set, which should be comparable to the training data set as much as possible.

Within LPStraining and this LPStesting functions, standardization step is included as an option to minimize the difference between training and testing data sets. Whether or not a user decides to do standardization, this should be consistent between training and testing data sets, otherwise this current testing function will not work.

Notice that this step is only to make distributions of each selected features comparable within training or testing data sets. Be aware that this feature-wise standardization cannot make the sample-wise distributions comparable. For example, the training data set must have two classification groups, however, the proportion of one group might be much smaller than the other group in the testing data set compared to the training data set, or even worse, the testing data set might only contain one classification group only. This is the common problem for classification, and feature-wise standardization cannot solve the problem.

In order to solve the problem, we should make data comparable as much as possbile before classification step. For example, use the same pre-processing settings and make suitable batch effect correction. For classification with LPS approach, we also suggest to combine training and testing data together as a full data set in this LPStesting function, to avoid forcing samples into two groups' classification while there is actual only one group in the testing data set.

LPS calculation is based on Wright 2003. The fomula is straightforward: LPS(X) = \sum a_j x_ij Here a_j represents the jth selected feature weights, and x_ij is the corresponding feature value for the ith sample.

When a Empirical Bayesian probability is calculated, by default, the 1st group in the input mean and sd vectors is treated as the test group. When we calculate the probabilities, we first calcualte probability that a sample belongs to either group, and then use the following formula to get Empirical Bayesian probability: prob(x) = d_test(x)/(d_test(x) + d_ref(x)) Here prob(x) is the Empirical Bayesian probability of a given sample, d_test(x) is the density value assuming that a given sample belongs to the test group, d_ref(x) is the density value assuming that a given sample belongs to the reference group. In the current function, however, we calculate Empirical Bayesian probabilities for both directions.

Value

A data frame with LPS score, Empirical Bayesian probabilites for two groups and classification

Author(s)

Aixiang Jiang

References

Wright G, Tan B, Rosenwald A, Hurt EH, Wiestner A, Staudt LM. A gene expression-based method to diagnose clinically distinct subgroups of diffuse large B cell lymphoma. Proc Natl Acad Sci U S A. 2003 Aug 19;100(17):9991-6.


ajiangsfu/PRPS documentation built on April 29, 2023, 10:13 p.m.