fit.cond: fit.cond

Description Usage Arguments Details

View source: R/fit.3g.R

Description

Fit a specific bivariate Gaussian mixture distribution, conditioning on one variable

Usage

1
2
3
4
fit.cond(Z, pars = c(0.8, 0.1, 2, 3, 1, 0), C = 1, weights = rep(1,
  dim(Z)[1]), fit_null = FALSE, one_way = FALSE, syscov = 0, sgm = 0.8,
  fixpi1 = TRUE, incl_z = FALSE, method = "L-BFGS-B",
  control = list(factr = 10))

Arguments

Z

an n x 2 matrix; Z[i,1], Z[i,2] are the Z_d and Z_a scores respectively for the ith SNP

pars

vector containing initial values of pi0, pi1, tau, sigma1, sigma2, rho.

C

a term C log(pi0*pi1*pi2) is added to the likelihood so the model is specified.

weights

SNP weights to adjust for LD; output from LDAK procedure

fit_null

set to TRUE to fit null model with forced rho=0, tau=0

one_way

if TRUE, fits a single-Gaussian for category 3, rather than the symmetric model. Requires signed Z scores.

syscov

if subgroup proportions in the case group do not match those in the population, Z_d and Z_a scores must be transformed. This leads to a systematic correlation (see function syscor). This parameter forces adjustment of the fitted model to allow for this correlation.

sgm

force sigma1 ≥ sgm, sigma2 ≥ sgm, tau ≥ sgm. True marginals variances should never be less than 1, but some variation should be allowed.

fixpi1

set to TRUE to fix pi1 when fitting.

incl_z

set to TRUE to include input arguments Z and weights in output. If FALSE these are set to null.

control

additional parameters passed to the R function optim.

Details

The mixture distribution simultaneously models two sets of GWAS summary statistics arising from a control group and two case groups comprising subgroups of a disease case group of interest. The values Z_a correspond to Z-scores arising from comparing the control group with the combined case group, and the values Z_d from comparing one case subgroup with the other, independent of controls.

We expect that SNPs can be classified into three categories, corresponding to the three two-dimensional Gaussians in the joint distribution of Z_a and Z_d. These three categories are: SNPs not associated with the phenotype and not differentiating subtypes; SNPs associated with the phenotypebut not differentiating subtypes; and SNPs differentiating subtypes.

Each of these three categories gives rise to a mixture Gaussian with a different shape. We are interested in whether the data support evidence that SNPs in the third category additionally differentiate cases and controls. Formally, we assume:

Z_a,Z_d ~ pi0 G0 + pi1 G1 + (1-pi0 - pi1) G2

where G0, G1 are bivariate Gaussians with mean (0,0) and covariance matrices (1,0;0,1), (sigma1^2,0;0,1) respectively, and G2 is an equally-weighted mixture of two Gaussians with mean (0,0) and covariance matrices (sigma2^2,rho;rho,tau^2 ) and (sigma2^2,-rho;-rho,tau^2 ).

The model is thus characterised by the vector pars=(pi0,pi1,tau,sigma1,sigma2,rho). Under the null hypothesis that SNPs which differentiate subtypes are not in general associated with the phenotype, we have sigma2=1, rho=0.

In estimating the null distribution of the test statistic, frequently the only available null test cases have tau=1. This can cause false-positives if


jamesliley/subtest documentation built on May 18, 2019, 11:21 a.m.