ztobins: Binning of z-scores and estimation of the probabilities in... In repfdr: Replicability Analysis for Multiple Studies of High Dimension

Description

For each study, the function discretizes the z-scores into bins and estimates the probabilities in each bin for the null and non-null states.

The function can plot diagnostic plots (disabled by default) for model fit. These should be monitored for misfit of model to data, before using function output in repfdr. See description of diagnostic plots below.

Usage

 1 2 3 4 5 6 ztobins(zmat, n.association.status = 3, n.bins = 120, type = 0, df = 7, central.prop = 0.5, pi0=NULL,plot.diagnostics = FALSE, trim.z=FALSE,trim.z.upper = 8,trim.z.lower = -8, force.bin.number = FALSE, pi.using.plugin = FALSE, pi.plugin.lambda = 0.05)

Arguments

 zmat Matrix of z-scores of the features (in rows) in each study (columns). n.association.status either 2 for no-association\association or 3 for no-associtation\negative-association\positive-association. n.bins Number of bins in the discretization of the z-score axis (the number of bins is n.bins - 1). If the number of z-scores per study is small, we set n.bins to a number lower than the default of 120 (about equals to the square root of the number of z-scores). To override the bin number cap (and create a discretization of the data that is sparse), use the force.bin.number = TRUE argument. type Type of fitting used for f; 0 is a natural spline, 1 is a polynomial, in either case with degrees of freedom df (so total degrees of freedom including the intercept is df+1). df Degrees of freedom for fitting the estimated density f(z). central.prop Central proportion of the z-scores used like the area of zero-assumption to estimate pi0. pi0 Sets argument for estimation of proportion of null hypotheses. Default value is NULL (automatic estimation of pi0) for every study. Second option is to supply vector of values between 0 and 1 (with length of the number of studies/ columns of zmat. These values will be used for pi0. plot.diagnostics If set to TRUE, will show disgnostics plots for density estimation for each study. First plot is a histogram of counts for each bin (Displayed as white bars), along with fitted density in green. Pink bars represent the observed number of counts in each bins, minus the expected number of null hypotheses by the model (truncated at zero). Red and Orange dashed lines represent the estimated densities for non null distributions fitted by the spline. A blue dashed line represents the density component of Z scores for null SNPS, N(0,1). A second plot is the Normal Q-Q plot of Zscores, converted using qnorm to the normal scale. A valid graph should coincide with a the linear fit displayed. A misfit with the linear plot could indicate either a null distribution which is not standard normal (a problem), or an extreme number of non null P-Values (Signal is not sparse, output is still valid). A black dashed line markes the expected fit for the standard normal distribution (with a single black dot for the (0,0) point). If the linear fit for the Q-Q plot (red line) does not match the dashed black line, the null distribution of the data is not standard normal. Misfit in these two plots should be investigated by the user, before using output in repfdr Default value is False. trim.z If set to TRUE, Z scores above trim.z.upper or below trim.z.lower will be trimmed at their respective limits. Default value if FALSE trim.z.upper Upper bound for trimming Z scores. Default value is 8 trim.z.lower Lower bound for trimming Z scores. Default value is -8 force.bin.number Set to T to be able to create a discretization with n.bins>sqrt(nrow(zmat)). pi.using.plugin Logical flag indicating whether estimation of the number of null hypotheses should be done using the plugin estimator.(Default is F). The plugin estimator is (sum(Pvalues > pi.plugin.lambda) + 1)/(m * (1-pi.plugin.lambda)) where m is the number of P-values. pi.plugin.lambda Parameter used for estimation of proportion of null hypotheses, for one sided tests. Default value is 0.05. This should be set to the type 1 error used for hypothesis testing.

Details

This utility function outputs the first two arguments to be input in the main function repfdr.

Value

A list with:

 pdf.binned.z A 3-dimensional array which contains for each study (first dimension), the probabilities of a z-score to fall in the bin (second dimension), under each hypothesis status (third dimension). The third dimension can be of size 2 or 3, depending on the number of association states: if the association can be either null or only in one direction, the dimension is 2; if the association can be either null, or positive, or negative, the dimension is 3. binned.z.mat A matrix of the bin numbers for each the z-scores (rows) in each study (columns). breaks.matrix A matrix with n.bins + 1 rows and ncol(zmat) columns, representing for each study the discretization chosed. Values are the between bin breaks. First and last values are the edges of the outmost bins. df Number of degrees of freedom, used for spline fitting of density. proportions Matrix with n.association.status rows, and ncol(zmat) columns, giving the estimated proportion of each component, for each study. PlotWarnings Vector of size ncol{zmat}, keeping the warnings given for each study (available here, in the plots for each study and printed to console). With no warnings given for study, value is NA