ShrinkSeq: Simultaneous shrinkage by empirically estimating multiple...
In markvdwiel/ShrinkBayes: Bayesian analysis of high-dimensional omics data, either Gaussian or counts

ShrinkSeq

R Documentation

Simultaneous shrinkage by empirically estimating multiple priors, Poisson and Negative Binomial likelihoods plus zero-inflated versions thereof

Description

This function implements the iterative joint procedure as described in Van de Wiel et al. (2012). It recursively applies INLA, and fits parameters of multiple priors by an EM-type algorithm.

Usage

ShrinkSeq(form, dat, fams = "zinb", shrinkfixed = NULL, 
shrinkaddfixed = NULL, shrinkrandom = NULL, shrinkaddrandom=NULL,shrinkdisp = TRUE, 
shrinkp0 = FALSE, mixtdisp = FALSE, mixtrand = FALSE, excludefornull=NULL, fixedmeanzero = FALSE, 
addfixedmeanzero = TRUE, curvedisp = FALSE, maxiter = 15,  
ntag=ifelse(is.null(excludefornull),c(100,200,500,1000),c(1000)), ntagcurve = 5000, 
fixed = c(0, 1/10), addfixed = c(0, 1/10), randomprec = c(1,10^(-5)), addrandomprec = c(1,10^(-5)),
logdisp = c(0, 0.01), diracprob0=ifelse((mixtrand | !is.null(excludefornull)), 0.8, 0.2), logitp0 = c(0, 0.01), 
fixedseed = TRUE, ndraw = 10000, safemode = TRUE, tol = ifelse((mixtrand | !is.null(excludefornull)),0.005,0.01), 
tolrand = 0.02, mliktol = 0.1, designlist=NULL,...)

Arguments

`form`	Formula. See `inla` and `f` for specification of the model formula.
`dat`	Matrix, data frame or list containing the data. Rows are features, columns are samples. For lists: each component represents a feature.
`fams`	Character string. Either equal to `"poisson"`, `"zip"` (zero-inflated Poisson), `"nb"` (negative binomial), or `"zinb"` (zero-inflated negative binomial): likelihood to be used.
`shrinkfixed`	String. Name of the fixed parameter for which a Gaussian prior is fit.
`shrinkaddfixed`	String or character vector. Name(s) of additional fixed parameter(s) for which a Gaussian prior is fit.
`shrinkrandom`	String. Name of Gaussian random effects parameter for which an inverse-Gamma prior is fit to the variance
`shrinkaddrandom`	String or character vector. Name(s) of additional random effects parameter(s) for which inverse-Gamma priors are fit to the variances
`shrinkdisp`	Boolean. Fit a prior to the log-overdispersion parameter of the (zero-inflated) negative binomial?
`shrinkp0`	Boolean. Fit a Gaussian prior to the logit(zero inflation) parameter of the zero-inflated poisson or negative binomial?
`mixtdisp`	Boolean. Add a point mass on zero to the Gaussian prior of the log-overdispersion parameter? Should be FALSE when `mixtrand=TRUE`.
`mixtrand`	Boolean. Add a point mass on zero to the inverse-Gamma prior for the random effect? Should be FALSE when `mixtdisp=TRUE`.
`excludefornull`	String or character vector. Name(s) of parameter(s) to be excluded from the given model in `form` resulting in the null mode
`fixedmeanzero`	Boolean. Fix the mean of the Gaussian prior of the `shrinkfixed` parameter to zero?
`addfixedmeanzero`	Boolean or list of booleans. Fix the mean(s) of the Gaussian prior of the `shrinkaddfixed` parameter(s) to zero?
`curvedisp`	Boolean. Apply a curvature prior for log-overdispersion of the negative binomial?
`maxiter`	Integer. The maximum number of iterations in the iterative marginal procedure
`ntag`	Integer vector. Consecutive number of data rows used for fitting the priors. Rows are sampled at random with a fixed seed.
`ntagcurve`	Integer vector. Only relevant if `curvedisp=TRUE`. Consecutive number of data rows used for fitting a curvature prior for overdispersion. Rows are sampled at random with a fixed seed.
`fixed`	Numeric vector of length 2. Mean and precision of initial Gaussian prior of `shrinkfixed`
`addfixed`	Numeric vector of length 2 or list of vectors of size 2. Mean(s) and precision(s) of initial Gaussian prior(s) of `shrinkaddfixed`
`randomprec`	Numeric vector of length 2. Shape and rate of initial inverse-Gamma prior of `shrinkrandom`
`addrandomprec`	Numeric vector of length 2. Shape and rate of initial inverse-Gamma priors of `shrinkaddrandom`
`logdisp`	Numeric vector of length 2. Mean and precision of initial Gaussian prior of log-overdispersion
`diracprob0`	Numeric between 0 and 1. Initial mixture proportion for zero-component of the prior for `shrinkdisp` or for the random effects variance. Only relevant when `mixtdisp=TRUE` or `mixtrand=TRUE`
`logitp0`	Numeric vector of length 2. Mean and precision of initial Gaussian prior of logit (zero-inflation) parameter
`fixedseed`	Boolean. Use a fixed seed for sampling the features?
`ndraw`	Integer. The number of random draws from the empirical mixture of posteriors
`safemode`	Boolean. If `TRUE`, only use features for which both models provide a fit, in case of a mixture prior.
`tol`	Numeric. Tolerance for the Kolmogorov-Smirnov distance between two succesive iterations.
`tolrand`	Numeric. Tolerance for the Kolmogorov-Smirnov distance between two succesive iterations for the prior of the random effect variance .
`mliktol`	Numeric. Relative tolerance in percentage between two succesive iterations for the total marginal likelihood.
`designlist`	List. Components are data frames containting the variables in `form`. Length of list should equal nr of features in `data`. If `NULL` design is assumed to be the same for all features (and inferred from `form`)
`...`	Further arguments passed on to `inla`.

Details

If a prior is fit for a particular parameter, this effectuates Bayesian shrinkage for that parameter. The iterative algorithm operates on an increasing number of tags, which are indicated in ntag. The algorithm stops if the following conditions are satisfied: 1. it has processed on the largest element of ntag 2a. It has reached maxiter OR 2b. For all non-random effects priors the Kolmogorov-Smirnov (KS) distance between two consecutive fits of the prior is smaller than tol AND the KS distance between two consecutive fits of the random effects variance is smaller than tolrand OR 2c. The relative difference (Total log marg lik - Previous total log marg lik)/ abs(Total log marg lik) < mliktol.

Computing time is largely determined by the largest element in ntag and maxiter. We advise to use at least max(ntag)=1000 and max(ntag)=2500 when either mistdisp=TRUE or mixtrand=TRUE. About shrinkfixed and shrinkaddfixed: if one of the fixed parameters is the primary parameter of interest use shrinkfixed for this one, and, when desired, shrinkaddfixed for an another one.

Value

A list object containing:

`pmlist`	Final estimates of the parameters of the (fitted) priors
`ksall`	Kolmogorv-Smirnov distances between consecutive estimated priors
`paraall`	List object containing all consecutive parameter estimates
`curvedispfun`	Function that links log-overdispersion with log-mean; only relevant when `curvedisp=TRUE`
`inputpar`	Input parameters used
`addfixed`	Parameter values of additional fixed parameters. Passed on for convenience.
`addrandom`	Parameter values of additional random effects parameters. Passed on for convenience.
`typelik`	Type of likelihood used

Author(s)

Mark A. van de Wiel

References

Van de Wiel MA, Leday GGR, Pardo L, Rue H, Van der Vaart AW, Van Wieringen WN (2012). Bayesian analysis of RNA sequencing data by estimating multiple shrinkage priors. Biostatistics.

Van de Wiel MA, Neerincx M, Buffart TE, Sie D, Verheul HMW (2014). ShrinkBayes: a versatile R-package for analysis of count-based sequencing data in complex study designs. BMC Bioinformatics. 15(1):116.

Examples

## Not run: 
#For an example that combines posteriors (using a mixture prior for overdispersion like in the CAGE example in Van de Wiel et al., 2012): 
#see CombinePosteriors
#To empty current R memory and enlarge memory.limit (Windows) consider, before applying 'ShrinkBayes', using 
#rm(list=ls());gc();
#memory.limit(size = 4000)

#loads brain sequencing data (CAGE); 10,000 rows (tag clusters), 25 samples
data(CAGEdata10000)
CAGEdata <- CAGEdata10000

#FOR FIRST TRY: USE 1000 DATA ROWS ONLY
CAGEdata <- CAGEdata[1:1000,]

#loads design of the brain study; includes covariates
data(design_brain)

#retrieves covariates from the design matrix
pers <- design_brain$pers  #persons
batch<-design_brain$batch   #batch
groupfac <- design_brain$groupfac #group (= brain region)

#Number of cpus to use in (parallel) computations
ncpus2use <- 10

#redefines baseline level as '0' such that it is in further analysis always used as baseline. 
groupfac <- BaselineDef("groupfac",baselinegroup="1")

#By default only comparisons with the baseline are included. Use the following convenience function to create the other 
#pair-wise comparisons for a given factor when this contains 3 or more levels (groups)
lincombvec <- AllComp("groupfac")

#Formula as used by INLA
form = y ~ 1 + groupfac + batch + f(pers,model="iid") #'batch' and 'group' as fixed effect, 'pers' as random effect

#Simultaneous hrinkage for 'groupfac' and 'pers'. In addition, the negative binomial overdispersion is shrunken by default (see shrinkdisp argument). 
#In the example below we allow a mixture prior for overdispersion (mixtdisp=TRUE). This increases computing time by a factor 2.
#INCREASE maxiter FOR OBTAINING FINAL RESULTS (DEFAULT = 15)
shrinksimul <- ShrinkSeq(form=form, dat=CAGEdata,shrinkfixed="groupfac",shrinkrandom="pers",ncpus=ncpus2use, maxiter=3)

#Fit all using the priors resulting from ShrinkSeq and zi-nb likelihood; also find posteriors for the linear combinations defined above
fitzinb <- FitAllShrink(form,dat=CAGEdata,fams="zinb",shrinksimul,ncpus=ncpus2use,lincomb=lincombvec)

#Find nonparametric prior for all group-wise differences. These include the linear combinations. 
#INCREASE maxiter FOR OBTAINING FINAL RESULTS (DEFAULT = 15)
npprior <- NonParaUpdatePrior(fitall=fitzinb,modus="fixed", includeP0 = FALSE, shrinkpara="groupfac", shrinklc=TRUE,ncpus=ncpus2use, maxiter = 3)

#Update posteriors using the nonparametric prior
nppostshr <- NonParaUpdatePosterior(fitzinb,npprior,ncpus=ncpus2use)

#Compute local fdrs ( = posterior null (tail-)probabilities under shrinkage prior)
#Here P(theta <= 0.1) is computed, assuming {theta<=0.1} is the null-domain.
lfdr <- SummaryWrap(nppostshr, thr = 0.1)

#Compute Bayesian FDRs; all BFDRs for each comparison [matrix format]
BFDRs <- BFDR(lfdr)

#Compute Bayesian FDRs; BFDR for multiple comparisons
BFDRmult <- BFDR(lfdr,multcomp=TRUE)

## End(Not run)

markvdwiel/ShrinkBayes documentation built on March 27, 2022, 7:47 p.m.

markvdwiel/ShrinkBayes index

README.md ShrinkBayes

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

markvdwiel/ShrinkBayes
Bayesian analysis of high-dimensional omics data, either Gaussian or counts

ShrinkSeq: Simultaneous shrinkage by empirically estimating multiple...
In markvdwiel/ShrinkBayes: Bayesian analysis of high-dimensional omics data, either Gaussian or counts

Simultaneous shrinkage by empirically estimating multiple priors, Poisson and Negative Binomial likelihoods plus zero-inflated versions thereof

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to ShrinkSeq in markvdwiel/ShrinkBayes...

R Package Documentation

Browse R Packages

We want your feedback!

markvdwiel/ShrinkBayes Bayesian analysis of high-dimensional omics data, either Gaussian or counts

ShrinkSeq: Simultaneous shrinkage by empirically estimating multiple... In markvdwiel/ShrinkBayes: Bayesian analysis of high-dimensional omics data, either Gaussian or counts

Simultaneous shrinkage by empirically estimating multiple priors, Poisson and Negative Binomial likelihoods plus zero-inflated versions thereof

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to ShrinkSeq in markvdwiel/ShrinkBayes...

R Package Documentation

Browse R Packages

We want your feedback!

markvdwiel/ShrinkBayes
Bayesian analysis of high-dimensional omics data, either Gaussian or counts

ShrinkSeq: Simultaneous shrinkage by empirically estimating multiple...
In markvdwiel/ShrinkBayes: Bayesian analysis of high-dimensional omics data, either Gaussian or counts