PRPStesting: PRPS testing

View source: R/PRPStesting.R

PRPStestingR Documentation

PRPS testing

Description

This is the function to calculate PRPS (Probability ratio based classification predication score) scores for a testing data set with a given PRPS training object.The selected features with their weights, and two classes' PRPS score distribution parametes are extracted from the given PRPS training object.

Usage

PRPStesting(
  PRPStrainObj,
  newdat,
  standardization = FALSE,
  classProbCut = 0.9,
  imputeNA = FALSE,
  byrow = TRUE,
  imputeValue = c("median", "mean")
)

Arguments

PRPStrainObj

a PRPS training object, which is the output from function PRPStraining

newdat

a new data matrix or data frame, which is comparable to training data set, with columns for samples and rows for features

standardization

a logic variable to indicate if standardization is needed before classification score calculation

classProbCut

a numeric variable within (0,1), which is a cutoff of Empirical Bayesian probability, often used values are 0.8 and 0.9, default value is 0.9. Only one value is used for both groups, the samples that are not included in either group will be assigned as UNCLASS

imputeNA

a logic variable to indicate if NA imputation is needed, if it is TRUE, NA imputation is processed before any other steps, the default is FALSE

byrow

a logic variable to indicate direction for imputation, default is TRUE, which will use the row data for imputation

imputeValue

a character variable to indicate which value to be used to replace NA, default is "median", the median value of the chose direction with "byrow" data to be used

Details

This is the function to calculate PRPS scores and make classification and Empirical Bayesian probabilities for a testing new data set. However, this new data set should be comparable to the training data set as much as possible. Within PRPStraining and within this current PRPStesting functions, standardization step is included as an option to minimize the difference between training and testing data sets, but this step is only done to make distributions of each selected features comparable. Be aware that this feature-wise standardization cannot make the sample-wise distributions comparable. For example, the training data set must have two classification groups, however, the proportion of one group sample might be much less than the other group in the testing data set compared to the training data set, or even worse, the testing data set might only contain one classification group only. This is the common problem for classification and feature-wise standardization cannot solve the problem. In order to solve the problem, we should make data comparable as much as possbile before classification step. For example, use the same pre-processing settings and make suitable batch effect correction. For classification with PRPS approach, we also suggest to combine traing and testing data together as "newdat" for this PRPStesting function, to avoid forcing two groups' classification while there is actual only one group in the testing group. However, if we know that the testing and training data sets are not comparable, we should set isCompToTrain as FALSE and choose an approoriate group1ratioPrior as a prior for PRPS score calculation. PRPS calculation is based on Ennishi 2018. The fomula is PRPS(X_i) = \sum (|a_j| log10(P1(x_ij)/P0(x_ij))) Here, a_j represents the jth selected feature weights, and x_ij is the corresponding feature value for the ith sample, P1 and P0 are the probabilities that the ith sample belongs to two different groups. The therotic cutoff is 0 to make classification calls based on PRPS score, alternatively, we can use Bayes approach to make calls. When we calculate a Empirical Bayes' probability, the 1st group in the input mean and sd vectors is treated as the test group. When calculate the probabilities, we first calcualte probability that a sample belongs to either group, and then use the following formula to get Empirical Bayes' probability: prob(x) = d_test(x)/(d_test(x) + d_ref(x)) Here prob(x) is the Empirical Bayes' probability of a given sample, d_test(x) is the density value that a given sample belongs to the test group, d_ref(x) is the density value that a given sample belongs to the reference group. Notice that the test and reference group is just the relative grouping, in fact, for this step, we often need to calculate Empirical Bayes' probabilities for a given sample from two different standing points.

Value

A data frame with PRPS scores, Empirical Bayesian probabilites for two groups and classification, and classification based on 0 natural cutoff on PRPS scores.

Author(s)

Aixiang Jiang

References

Ennishi D, Jiang A, Boyle M, Collinge B, Grande BM, Ben-Neriah S, Rushton C, Tang J, Thomas N, Slack GW, Farinha P, Takata K, Miyata-Takata T, Craig J, Mottok A, Meissner B, Saberi S, Bashashati A, Villa D, Savage KJ, Sehn LH, Kridel R, Mungall AJ, Marra MA, Shah SP, Steidl C, Connors JM, Gascoyne RD, Morin RD, Scott DW. Double-Hit Trait Expression Signature Defines a Distinct Subgroup of Germinal Center B-Cell-Like Diffuse Large B-Cell Lymphoma. J Clin Oncol. 2018 Dec 3:JCO1801583. doi: 10.1200/JCO.18.01583.

Wright G, Tan B, Rosenwald A, Hurt EH, Wiestner A, Staudt LM. A trait expression-based method to diagnose clinically distinct subgroups of diffuse large B cell lymphoma. Proc Natl Acad Sci U S A. 2003 Aug 19;100(17):9991-6.


ajiangsfu/PRPS documentation built on April 29, 2023, 10:13 p.m.