Description Usage Arguments Details Value Author(s) Examples
Fit a specific Gaussian mixture distribution.
1 2 3 4 5 |
Z |
an n x 2 matrix; Z[i,1], Z[i,2] are the Z_d and Z_a scores respectively for the ith SNP |
pars |
vector containing initial values of |
weights |
SNP weights to adjust for LD; output from LDAK procedure |
C |
a term C log( |
fit_null |
set to TRUE to fit null model with forced |
maxit |
maximum number of iterations before algorithm halts |
tol |
how small a change in pseudo-likelihood halts the algorithm |
sgm |
force |
one_way |
if TRUE, fits a single-Gaussian for category 3, rather than the symmetric model. Requires signed Z scores. |
syscov |
if subgroup proportions in the case group do not match those in the population, Z_d and Z_a scores must be transformed. This leads to a systematic correlation (see function |
accel |
attempts to accelerate the fitting process by taking larger steps. |
verbose |
prints current parameters with frequency defined by |
incl_z |
set to TRUE to include input arguments |
em |
set to TRUE to use E-M algorithm, FALSE to use R's |
control |
parameters passed to R's |
save |
history to a file with frequency defined by |
b_int |
save or print current |
The mixture distribution simultaneously models two sets of GWAS summary statistics arising from a control group and two case groups comprising subgroups of a disease case group of interest. The values Z_a correspond to Z-scores arising from comparing the control group with the combined case group, and the values Z_d from comparing one case subgroup with the other, independent of controls.
We expect that SNPs can be classified into three categories, corresponding to the three two-dimensional Gaussians in the joint distribution of Z_a and Z_d. These three categories are: SNPs not associated with the phenotype and not differentiating subtypes; SNPs associated with the phenotypebut not differentiating subtypes; and SNPs differentiating subtypes.
Each of these three categories gives rise to a mixture Gaussian with a different shape. We are interested in whether the data support evidence that SNPs in the third category additionally differentiate cases and controls. Formally, we assume:
pdf(Z_a,Z_d) = pi0 G0 + pi1 G1 + (1-pi0 - pi1) G2
where G0, G1 are bivariate Gaussians with mean (0,0) and covariance matrices (1,0;0,1), (sigma1
^2,0;0,1) respectively, and G2 is an equally-weighted mixture of two Gaussians with mean (0,0) and covariance matrices (sigma2
^2,rho
;rho
,tau
^2 ) and (sigma2
^2,-rho
;-rho
,tau
^2 ).
The model is thus characterised by the vector pars
=(pi0
,pi1
,tau
,sigma1
,sigma2
,rho
). Under the null hypothesis that SNPs which differentiate subtypes are not in general associated with the phenotype, we have sigma2
=1, rho
=0.
This function finds the maximum pseudo-likelihood estimators for the paramaters of these three Gaussians, and the mixing parameters representing the proportion of SNPs in each category.
a list of six objects (class 3Gfit
): pars
is the vector of fitted parameters, history
is a matrix of fitted parameters and pseudo-likelihood at each stage in the E-M algorithm, logl
is the joint pseudo-likelihood of Z_a and Z_d, logl_a
is the pseudo-likelihood of Z_a alone (used for adjusting PLR), z_ad
is n x 2 matrix of Z_d and Z_a scores, weights
is the vector of weights used to generate the model, and hypothesis
is 0 or 1 depending on the value of fit_null
.
Chris Wallace and James Liley
1 2 3 4 5 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.