Description Usage Arguments Value Normalization Time efficiency Author(s) References See Also Examples
This function trains a Linear Predictor Score model, given pre-computed coefficients. It uses data with known classes to fit the model.
It has numerous way to be called, and all the arguments are not mandatory. See the 'Examples' section.
1 |
data |
Continuous data used to retrieve classes, as a |
coeff |
Pre-computed coefficients for the model, as returned by |
response |
Already known classes for the samples provided in |
k |
Single |
threshold |
Single |
formula |
A |
method |
Single character value, to be passed to |
... |
Further arguments are passed to |
An object of (S3) class "LPS" :
coeff |
Named numeric vector, the coefficients used in the model. |
classes |
Character vector, the labels of the two groups to be predicted. |
scores |
List of two numeric vectors, training dataset scores sorted by group. |
means |
Numeric vector, score means of each group in the training dataset. |
sds |
Numeric vector, score |
ovl |
Numeric value, overlapping coefficient as returned by |
k |
Integer value, amount of features selected in the model (if relevant). |
p.threshold |
Numeric value, threshold used for feature selection (if relevant). |
p.method |
Character value, p-value correction used for feature selection (if relevant). |
As expression values are directly used in the score, gene centering and scaling are strongly recommended. For Affymetrix raw expression values (strictly positive, linear and absolute), Wright et al. suggests a multiplicative centering on a median of 1000 followed by a log2 transformation. For log-ratio, gene centering and scaling should not be necessary, as they are naturally 0-centered.
Using a numeric matrix as data
and a factor as response
is the fastest way to compute coefficients, if time consumption matters (as in cross-validation schemes). formula
is there only for consistency with R modeling functions, and to provide response
, k
or threshold
in a single way.
Sylvain Mareschal
Radmacher MD, McShane LM, Simon R. A paradigm for class prediction using gene expression profiles. J Comput Biol. 2002;9(3):505-11.
Wright G, Tan B, Rosenwald A, Hurt EH, Wiestner A, Staudt LM. A gene expression-based method to diagnose clinically distinct subgroups of diffuse large B cell lymphoma. Proc Natl Acad Sci U S A. 2003 Aug 19;100(17):9991-6.
Bohers E, Mareschal S, Bouzelfen A, Marchand V, Ruminy P, Maingonnat C, Menard AL, Etancelin P, Bertrand P, Dubois S, Alcantara M, Bastard C, Tilly H, Jardin F. Targetable activating mutations are very frequent in GCB and ABC diffuse large B-cell lymphoma. Genes Chromosomes Cancer. 2014 Feb;53(2):144-53.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | # Data with features in columns
data(rosenwald)
group <- rosenwald.cli$group
expr <- t(rosenwald.expr)
# NA imputation (feature's mean to minimize impact)
f <- function(x) { x[ is.na(x) ] <- round(mean(x, na.rm=TRUE), 3); x }
expr <- apply(expr, 2, f)
# Coefficients
coeff <- LPS.coeff(data=expr, response=group)
# 10 best features (straightforward)
m <- LPS(data=expr, coeff=coeff, response=group, k=10)
# 10 best features (formula)
### 'k' MUST be an integer, or will be understood as a 'threshold'
### Numbers are "numeric", enforce integer with "L" or "as.integer"
m <- LPS(data=as.data.frame(expr), coeff=coeff, formula=group~10L)
k <- as.integer(10)
m <- LPS(data=as.data.frame(expr), coeff=coeff, formula=group~k)
# FDR threshold
thr <- 0.01
m <- LPS(data=expr, coeff=coeff, response=group, threshold=thr)
m <- LPS(data=as.data.frame(expr), coeff=coeff, formula=group~0.01)
m <- LPS(data=as.data.frame(expr), coeff=coeff, formula=group~thr)
# Custom model
m <- LPS(data=expr, coeff=coeff[ c("27481","17013") ,], response=group, k=2)
m <- LPS(data=as.data.frame(expr), coeff=coeff, formula=group~`27481`+`17013`)
### Notice backticks in formula for syntactically invalid names
# Complete model
m <- LPS(data=expr, coeff=coeff, response=group, k=ncol(expr))
m <- LPS(data=expr, coeff=coeff, response=group, threshold=1)
### m <- LPS(data=as.data.frame(expr), coeff=coeff, formula=group~.)
### The last is correct but (really) slow on large datasets
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.