fit.3g: fit.3g

Description Usage Arguments Details Value Author(s) Examples

View source: R/fit.3g.R

Description

Fit a specific Gaussian mixture distribution.

Usage

1
2
3
4
5
fit.3g(Z, pars = c(0.8, 0.1, 2, 2, 3, 0.5), weights = rep(1, dim(Z)[1]),
  C = 1, fit_null = FALSE, maxit = 10000, tol = 1e-04, sgm = 0.8,
  one_way = FALSE, syscov = 0, accel = TRUE, verbose = TRUE,
  file = NULL, n_save = 20, incl_z = TRUE, em = TRUE,
  control = list(factr = 10))

Arguments

Z

an n x 2 matrix; Z[i,1], Z[i,2] are the Z_d and Z_a scores respectively for the ith SNP

pars

vector containing initial values of pi0, pi1, tau, sigma1, sigma2, rho.

weights

SNP weights to adjust for LD; output from LDAK procedure

C

a term C log(pi0*pi1*pi2) is added to the likelihood so the model is specified.

fit_null

set to TRUE to fit null model with forced rho=0, tau=0

maxit

maximum number of iterations before algorithm halts

tol

how small a change in pseudo-likelihood halts the algorithm

sgm

force sigma1 ≥ sgm, sigma2 ≥ sgm, tau ≥ sgm. True marginal variances should never be less than 1, but some variation should be allowed.

one_way

if TRUE, fits a single-Gaussian for category 3, rather than the symmetric model. Requires signed Z scores.

syscov

if subgroup proportions in the case group do not match those in the population, Z_d and Z_a scores must be transformed. This leads to a systematic correlation (see function syscor). This parameter forces adjustment of the fitted model to allow for this correlation.

accel

attempts to accelerate the fitting process by taking larger steps.

verbose

prints current parameters with frequency defined by n_save

incl_z

set to TRUE to include input arguments Z and weights in output. If FALSE these are set to null.

em

set to TRUE to use E-M algorithm, FALSE to use R's optim function.

control

parameters passed to R's optim function; only used if em=FALSE

save

history to a file with frequency defined by n_save

b_int

save or print current pars every b_int iterations

Details

The mixture distribution simultaneously models two sets of GWAS summary statistics arising from a control group and two case groups comprising subgroups of a disease case group of interest. The values Z_a correspond to Z-scores arising from comparing the control group with the combined case group, and the values Z_d from comparing one case subgroup with the other, independent of controls.

We expect that SNPs can be classified into three categories, corresponding to the three two-dimensional Gaussians in the joint distribution of Z_a and Z_d. These three categories are: SNPs not associated with the phenotype and not differentiating subtypes; SNPs associated with the phenotypebut not differentiating subtypes; and SNPs differentiating subtypes.

Each of these three categories gives rise to a mixture Gaussian with a different shape. We are interested in whether the data support evidence that SNPs in the third category additionally differentiate cases and controls. Formally, we assume:

pdf(Z_a,Z_d) = pi0 G0 + pi1 G1 + (1-pi0 - pi1) G2

where G0, G1 are bivariate Gaussians with mean (0,0) and covariance matrices (1,0;0,1), (sigma1^2,0;0,1) respectively, and G2 is an equally-weighted mixture of two Gaussians with mean (0,0) and covariance matrices (sigma2^2,rho;rho,tau^2 ) and (sigma2^2,-rho;-rho,tau^2 ).

The model is thus characterised by the vector pars=(pi0,pi1,tau,sigma1,sigma2,rho). Under the null hypothesis that SNPs which differentiate subtypes are not in general associated with the phenotype, we have sigma2=1, rho=0.

This function finds the maximum pseudo-likelihood estimators for the paramaters of these three Gaussians, and the mixing parameters representing the proportion of SNPs in each category.

Value

a list of six objects (class 3Gfit): pars is the vector of fitted parameters, history is a matrix of fitted parameters and pseudo-likelihood at each stage in the E-M algorithm, logl is the joint pseudo-likelihood of Z_a and Z_d, logl_a is the pseudo-likelihood of Z_a alone (used for adjusting PLR), z_ad is n x 2 matrix of Z_d and Z_a scores, weights is the vector of weights used to generate the model, and hypothesis is 0 or 1 depending on the value of fit_null.

Author(s)

Chris Wallace and James Liley

Examples

1
2
3
4
5
nn=100000
Z=abs(rbind(rmnorm(0.8*nn,varcov=diag(2)), rmnorm(0.15*nn,varcov=rbind(c(1,0),c(0,2^2))), rmnorm(0.05*nn,varcov=rbind(c(3^2,2),c(2,4^2))))); weights=runif(nn)
yy=fit.3g(Z,pars=c(0.7,0.2,2.5,1.5,3,1),weights=weights,incl_z=TRUE)
yy$pars
plot(yy,rlim=2)

jamesliley/subtest documentation built on May 18, 2019, 11:21 a.m.