PStesting: PS testing

View source: R/PStesting.R

PStestingR Documentation

PS testing

Description

This is the function to calculate PS (Prediction Strength) scores for a testing data set with a given PS training object. The selected features with their weights, and two classes' PS score distribution parametes are extracted from the given PS training object.

Usage

PStesting(
  PStrainObj,
  newdat,
  classProbCut = 0.9,
  imputeNA = FALSE,
  byrow = TRUE,
  imputeValue = c("median", "mean")
)

Arguments

PStrainObj

a PS training object, which is the output from function PStraining

newdat

a new data matrix or data frame, which is comparable to training data set, with columns for samples and rows for features

classProbCut

a numeric variable within (0,1), which is a cutoff of Empirical Bayesian probability, often used values are 0.8 and 0.9, default value is 0.9. Only one value is used for both groups, the samples that are not included in either group will be assigned as UNCLASS

imputeNA

a logic variable to indicate if NA imputation is needed, if it is TRUE, NA imputation is processed before any other steps, the default is FALSE

byrow

a logic variable to indicate direction for imputation, default is TRUE, which will use the row data for imputation

imputeValue

a character variable to indicate which value to be used to replace NA, default is "median", the median value of the chose direction with "byrow" data to be used

Details

This is the function to calculate PS scores and make classification for a new testing data set, which should be comparable to the training data set as much as possible. Based on PS algorithm (Golub et al., 1999), standardization is a required step, therefore, standardization is included in this function and no option for it. However, this standardization is only done to make distributions of each selected features comparable. Be aware that this feature-wise standardization cannot make the sample-wise distributions comparable. For example, the training data set must have two classification groups, but the proportion of one group sample might be much less than the other group in the testing data set compared to the training data set, or even worse, the testing data set might only contain one classification group only. This is the common problem for classification and feature-wise standardization cannot solve the problem. In order to solve the problem, we should make data comparable as much as possbile before classification step. For example, use the same pre-processing settings and make suitable batch effect correction. For classification with PS approach, we also suggest to combine traing and testing data together as "newdat" for this PStesting function, to avoid forcing two groups' classification while there is actual only one group in the testing group.

PS score calculation formula is: PS = (V_win − V_lose)/(V_win + V_lose) Here, where V_win and V_lose are the vote totals for the winning and losing features/traits for a given sample

Value

A data frame with PS score and classification

Author(s)

Aixiang Jiang

References

Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science. 1999;286:531–7


ajiangsfu/PRPS documentation built on April 29, 2023, 10:13 p.m.