discTMix: Fit a discrete mixture of (noncentral) t-distributions
In gitlongor/pi0: Estimating the Proportion of True Null Hypotheses for FDR

Description Usage Arguments Details Value Author(s) References See Also

Discrete mixture on central t and a specified number of noncentral t-distributions. This is slightly modified code of tMixture in OCplus package.

1 2	discTMix(tstat, n1 = 10, n2 = n1, nq, p0, p1, D, delta, paired = FALSE, tbreak, ext = TRUE, threshold.delta=0.75, ...)

`tstat`	the vector of genewise t-statistics
`n1`	number of samples in the first group
`n2`	number of samples in the second group
`nq`	the number of components in the mixture that is fitted
`p0`	a starting value for the proportion of non-differentially expressed genes.
`p1`	a vector with starting values for the proportions of genes that are differentially expressed with effect size `D`.
`D`	a vector of starting values for the effect sizes of the differentially expressed genes, corresponding to the proportions `p1`.
`delta`	a vector of starting values for the effect sizes of the differentially expressed genes, expressed as non-centrality parameters; this is just a different way of specifying `D`, though if both are given, `delta` will get priority.
`paired`	a logical value indicating whether the t-statistics are two-sample or paired.
`tbreak`	either the number of equally spaced bins for tabulating `tstat`, or the explicit break points for the bins, very much like the argument `breaks` to function `cut`; the default value is the square root of the number of genes.
`ext`	a logical value indicating whether to extend the bins, i.e. to set the lowest bin limit to -infinity and the largest bin limit to inifinity.
`threshold.delta`	mixture components with an estimated absolute non-centrality parameter `delta` below this value are considered to be too small for independent estimation; these components and their corresponding `p1` are pooled with the null-component and `p0`, see Details.
`...`	additional arguments that are passed to `optim` to control the optimization.

The minimum parameter that needs to be specified is nq - if nothing else is given, the proportions are equally distributed between p0 and the p1, and the noncentrality parameters are set up symmetrically around zero, e.g. nq=5 leads to equal proportions of 0.2 and noncentrality parameters -2, -1, 1, and 2. If any of p1, D, or delta is specified, nq is redundant and will be ignored (with a warning). discTMix will in general make a valiant effort to deduce valid starting values from any combination of nq, p0, p1, D, and delta specified by the user, and will complain if that is not possible.

The fitting problem that this function tries to solve is badly conditioned, and will in general depend on the precise set of starting values. Multiple runs from different starting values are usually a good idea. We have found however, that the model seems fairly robust towards misspecification of the number of components, at least when estimating p0. What happens when too many components are specified is that some of the nominally noncentral t-distributions describing the behaviour of differentially expressed genes are fitted with noncentrality parameters very close to zero, and the true p0 gets spread out between the nominal p0 and the almost-central components. Adding up these different contributions usually gives a similar solution to re-fitting the model with fewer components. The cutoff for the size of non-centrality parameters that can be estimated realistically is specified via threshold.delta, whose default value is based on a small simulation study reported in Pawitan et al. (2005); see Examples. (Note that the AIC can also be helpful in determining the number of components.)

A list with class discTMix, with the following components:

`p0.est`	the estimated proportion of non-differentially expressed genes, after collapsing components with estimated non-centrality sizes below `threshold.delta`.
`p0.raw and pi0`	the estimated proportion before collapsing the components.
`p1`	the estimated proportions of differentially expressed genes corresponding to the effect sizes, relating to `p0.raw`.
`D`	effect sizes of the differentially expressed genes in multiples of the gene-by-gene standard deviation.
`delta`	effect sizes of the differentially expressed genes expressed as the noncentrality parameter of the corresponding noncentral t-distribution.
`AIC`	the AIC value for the maximum likelihood fit.
`opt`	The output from `optim`, giving details about the optimization process.
`data`	A list of tstat and df.