pepa-imputation: Heuristic to choose the value of the hyperparameter (fudge...

fudge2LRTR Documentation

Heuristic to choose the value of the hyperparameter (fudge factor) used to regularize the variance estimator in the likelihood ratio statistic

Description

fudge2LRT: heuristic to choose the value of the hyperparameter (fudge factor) used to regularize the variance estimator in the likelihood ratio statistic (as implemented in samLRT). We follow the heuristic described in 1 and adapt the code of the fudge2 function in the siggene R package.

1 Tusher, Tibshirani and Chu, Significance analysis of microarrays applied to the ionizing radiation response, PNAS 2001 98: 5116-5121, (Apr 24)

This function computes a regularized version of the likelihood ratio statistic. The regularization adds a user-input fudge factor s1 to the variance estimator. This is straightforward when using a fixed effect model (cases 'numeric' and 'lm') but requires some more care when using a mixed model

Usage

fudge2LRT(
  lmm.res.h0,
  lmm.res.h1,
  cc,
  n,
  p,
  s,
  alpha = seq(0, 1, 0.05),
  include.zero = TRUE
)

LH0(X, y1, y2)

LH1(X, y1, y2, j)

LH0.lm(X, y1, y2)

LH1.lm(X, y1, y2, j)

samLRT(lmm.res.h0, lmm.res.h1, cc, n, p, s1)

pepa.test(X, y, n1, n2, global = FALSE, use.lm = FALSE)

Arguments

lmm.res.h0

a vector of object containing the estimates (used to compute the statistic) under H0 for each connected component. If the fast version of the estimator was used (as implemented in this package), lmm.res.h0 is a vector containing averages of squared residuals. If a fixed effect model was used, it is a vector of lm objects and if a mixed effect model was used it is a vector or lmer object.

lmm.res.h1

similar to lmm.res.h0, a vector of object containing the estimates (used to compute the statistic) under H1 for each protein.

cc

a list containing the indices of peptides and proteins belonging to each connected component.

n

the number of samples used in the test

p

the number of proteins in the experiment

s

a vector containing the maximum likelihood estimate of the variance for the chosen model. When using the fast version of the estimator implemented in this package, this is the same thing as the input lmm.res.h1. For other models (e.g. mixed models) it can be obtained from samLRT.

alpha

A vector of proportions used to build candidate values for the regularizer. We use quantiles of s with these proportions. Default to seq(0, 1, 0.05)

include.zero

logical value indicating if 0 should be included in the list of candidates. Default to TRUE.

X

Binary q x p design matrix for q peptides and p proteins. X_(ij)=1 if peptide i belongs to protein j, 0 otherwise.

y1

n.pep*n.samples matrix giving the observed counts for each peptide in each sample from the condition 1

y2

n.pep*n.samples matrix giving the observed counts for each peptide in each sample from the condition 2

j

the index of the protein being tested, ie which has different expression in the two conditions under H1.

s1

the fudge factor to be added to the variance estimate

y

q x n matrix representing the log intensities of q peptides among n MS samples.

n1

number of samples under condition 1. It is assumed that the first n1 columns of y correspond to observations under condition 1.

n2

number of samples under condition 2.

global

if TRUE, the test statistic for each protein uses all residues, including the ones for peptides in different connected components. Can be much faster as it does not require to compute connected components. However the p-values are not well calibrated in this case, as it amounts to adding a ridge to the test statistic. Calibrating the p-value would require knowing the amplitude of the ridge, which in turns would require computing the connected components.

use.lm

if TRUE (and if global=FALSE), use lm() rather than the result in Proposition 1 to compute the test statistic

Value

(same as the fudge2 function of siggene): s.zero: the value of the fudge factor s0. alpha.hat: the optimal quantile of the 's' values. If s0=0, 'alpha.hat' will not be returned. vec.cv: the vector of the coefficients of variations. Following Tusher et al. (2001), the optimal 'alpha' quantile is given by the quantile that leads to the smallest CV of the modified test statistics. msg: a character string summarizing the most important information about the fudge factor.

xxxxxxxxxx..

xxxxxxxxxx..

xxxxxxxxxx..

xxxxxxxxxx..

llr.sam: a vector of numeric containing the regularized log likelihood ratio statistic for each protein. s: a vector containing the maximum likelihood estimate of the variance for the chosen model. When using the fast version of the estimator implemented in this package, this is the same thing as the input lmm.res.h1. lh1.sam: a vector of numeric containing the regularized log likelihood under H1 for each protein. lh0.sam: a vector of numeric containing the regularized log likelihood under H0 for each connected component. sample.sizes: a vector of numeric containing the sample size (number of biological samples times number of peptides) for each protein. This number is the same for all proteins within each connected component.

A list of the following elements:

  • llr: log likelihood ratio statistic (maximum likelihood version).

  • llr.map: log likelihood ratio statistic (maximum a posteriori version).

  • llr.pv: p-value for llr.

  • llr.map.pv: p-value for llr.map.

  • mse.h0: Mean squared error under H0

  • mse.h1: Mean squared error under H1

  • s: selected regularization hyperparameter for llr.map.

  • wchi2: weight used to make llr.map chi2-distributed under H0.

Author(s)

Thomas Burger, Laurent Jacob

Thomas Burger, Laurent Jacob

Thomas Burger, Laurent Jacob

Thomas Burger, Laurent Jacob

Thomas Burger, Laurent Jacob

Thomas Burger, Laurent Jacob

Thomas Burger, Laurent Jacob


samWieczorek/Dapar2 documentation built on May 13, 2022, 9:23 a.m.