discTMix: Fit a discrete mixture of (noncentral) t-distributions

Description Usage Arguments Details Value Author(s) References See Also

Description

Discrete mixture on central t and a specified number of noncentral t-distributions. This is slightly modified code of tMixture in OCplus package.

Usage

1
2
discTMix(tstat, n1 = 10, n2 = n1, nq, p0, p1, D, delta, paired = FALSE, 
         tbreak, ext = TRUE, threshold.delta=0.75, ...)

Arguments

tstat

the vector of genewise t-statistics

n1

number of samples in the first group

n2

number of samples in the second group

nq

the number of components in the mixture that is fitted

p0

a starting value for the proportion of non-differentially expressed genes.

p1

a vector with starting values for the proportions of genes that are differentially expressed with effect size D.

D

a vector of starting values for the effect sizes of the differentially expressed genes, corresponding to the proportions p1.

delta

a vector of starting values for the effect sizes of the differentially expressed genes, expressed as non-centrality parameters; this is just a different way of specifying D, though if both are given, delta will get priority.

paired

a logical value indicating whether the t-statistics are two-sample or paired.

tbreak

either the number of equally spaced bins for tabulating tstat, or the explicit break points for the bins, very much like the argument breaks to function cut; the default value is the square root of the number of genes.

ext

a logical value indicating whether to extend the bins, i.e. to set the lowest bin limit to -infinity and the largest bin limit to inifinity.

threshold.delta

mixture components with an estimated absolute non-centrality parameter delta below this value are considered to be too small for independent estimation; these components and their corresponding p1 are pooled with the null-component and p0, see Details.

...

additional arguments that are passed to optim to control the optimization.

Details

The minimum parameter that needs to be specified is nq - if nothing else is given, the proportions are equally distributed between p0 and the p1, and the noncentrality parameters are set up symmetrically around zero, e.g. nq=5 leads to equal proportions of 0.2 and noncentrality parameters -2, -1, 1, and 2. If any of p1, D, or delta is specified, nq is redundant and will be ignored (with a warning). discTMix will in general make a valiant effort to deduce valid starting values from any combination of nq, p0, p1, D, and delta specified by the user, and will complain if that is not possible.

The fitting problem that this function tries to solve is badly conditioned, and will in general depend on the precise set of starting values. Multiple runs from different starting values are usually a good idea. We have found however, that the model seems fairly robust towards misspecification of the number of components, at least when estimating p0. What happens when too many components are specified is that some of the nominally noncentral t-distributions describing the behaviour of differentially expressed genes are fitted with noncentrality parameters very close to zero, and the true p0 gets spread out between the nominal p0 and the almost-central components. Adding up these different contributions usually gives a similar solution to re-fitting the model with fewer components. The cutoff for the size of non-centrality parameters that can be estimated realistically is specified via threshold.delta, whose default value is based on a small simulation study reported in Pawitan et al. (2005); see Examples. (Note that the AIC can also be helpful in determining the number of components.)

Value

A list with class discTMix, with the following components:

p0.est

the estimated proportion of non-differentially expressed genes, after collapsing components with estimated non-centrality sizes below threshold.delta.

p0.raw and pi0

the estimated proportion before collapsing the components.

p1

the estimated proportions of differentially expressed genes corresponding to the effect sizes, relating to p0.raw.

D

effect sizes of the differentially expressed genes in multiples of the gene-by-gene standard deviation.

delta

effect sizes of the differentially expressed genes expressed as the noncentrality parameter of the corresponding noncentral t-distribution.

AIC

the AIC value for the maximum likelihood fit.

opt

The output from optim, giving details about the optimization process.

data

A list of tstat and df.

Author(s)

Long Qu slightly modified the tMixture by Y. Pawitan and A. Ploner in OCplus package.

References

Pawitan Y, Krishna Murthy KR, Michiels S, Ploner A (2005) Bias in the estimation of false discovery rate in microarray studies, Bioinformatics.

See Also

tstatistics, EOC, optim, fitted.discTMix


gitlongor/pi0 documentation built on May 17, 2019, 5:29 a.m.