fit.3g: fit.3g
In jamesliley/subtest: Test whether two putative subgroups of a disease phenotype show evidence of differential genetic basis.

Description Usage Arguments Details Value Author(s) Examples

Fit a specific Gaussian mixture distribution.

fit.3g(Z, pars = c(0.8, 0.1, 2, 2, 3, 0.5), weights = rep(1, dim(Z)[1]),
  C = 1, fit_null = FALSE, maxit = 10000, tol = 1e-04, sgm = 0.8,
  one_way = FALSE, syscov = 0, accel = TRUE, verbose = TRUE,
  file = NULL, n_save = 20, incl_z = TRUE, em = TRUE,
  control = list(factr = 10))

`Z`	an n x 2 matrix; Z[i,1], Z[i,2] are the Z_d and Z_a scores respectively for the ith SNP
`pars`	vector containing initial values of `pi0`, `pi1`, `tau`, `sigma1`, `sigma2`, `rho`.
`weights`	SNP weights to adjust for LD; output from LDAK procedure
`C`	a term C log(`pi0``pi1``pi2`) is added to the likelihood so the model is specified.
`fit_null`	set to TRUE to fit null model with forced `rho`=0, `tau`=0
`maxit`	maximum number of iterations before algorithm halts
`tol`	how small a change in pseudo-likelihood halts the algorithm
`sgm`	force `sigma1` ≥ sgm, `sigma2` ≥ sgm, `tau` ≥ sgm. True marginal variances should never be less than 1, but some variation should be allowed.
`one_way`	if TRUE, fits a single-Gaussian for category 3, rather than the symmetric model. Requires signed Z scores.
`syscov`	if subgroup proportions in the case group do not match those in the population, Z_d and Z_a scores must be transformed. This leads to a systematic correlation (see function `syscor`). This parameter forces adjustment of the fitted model to allow for this correlation.
`accel`	attempts to accelerate the fitting process by taking larger steps.
`verbose`	prints current parameters with frequency defined by `n_save`
`incl_z`	set to TRUE to include input arguments `Z` and `weights` in output. If FALSE these are set to null.
`em`	set to TRUE to use E-M algorithm, FALSE to use R's `optim` function.
`control`	parameters passed to R's `optim` function; only used if `em`=FALSE
`save`	history to a file with frequency defined by `n_save`
`b_int`	save or print current `pars` every `b_int` iterations

The mixture distribution simultaneously models two sets of GWAS summary statistics arising from a control group and two case groups comprising subgroups of a disease case group of interest. The values Z_a correspond to Z-scores arising from comparing the control group with the combined case group, and the values Z_d from comparing one case subgroup with the other, independent of controls.

We expect that SNPs can be classified into three categories, corresponding to the three two-dimensional Gaussians in the joint distribution of Z_a and Z_d. These three categories are: SNPs not associated with the phenotype and not differentiating subtypes; SNPs associated with the phenotypebut not differentiating subtypes; and SNPs differentiating subtypes.

Each of these three categories gives rise to a mixture Gaussian with a different shape. We are interested in whether the data support evidence that SNPs in the third category additionally differentiate cases and controls. Formally, we assume:

pdf(Z_a,Z_d) = pi0 G0 + pi1 G1 + (1-pi0 - pi1) G2

where G0, G1 are bivariate Gaussians with mean (0,0) and covariance matrices (1,0;0,1), (sigma1^2,0;0,1) respectively, and G2 is an equally-weighted mixture of two Gaussians with mean (0,0) and covariance matrices (sigma2^2,rho;rho,tau^2 ) and (sigma2^2,-rho;-rho,tau^2 ).

The model is thus characterised by the vector pars=(pi0,pi1,tau,sigma1,sigma2,rho). Under the null hypothesis that SNPs which differentiate subtypes are not in general associated with the phenotype, we have sigma2=1, rho=0.

This function finds the maximum pseudo-likelihood estimators for the paramaters of these three Gaussians, and the mixing parameters representing the proportion of SNPs in each category.

a list of six objects (class 3Gfit): pars is the vector of fitted parameters, history is a matrix of fitted parameters and pseudo-likelihood at each stage in the E-M algorithm, logl is the joint pseudo-likelihood of Z_a and Z_d, logl_a is the pseudo-likelihood of Z_a alone (used for adjusting PLR), z_ad is n x 2 matrix of Z_d and Z_a scores, weights is the vector of weights used to generate the model, and hypothesis is 0 or 1 depending on the value of fit_null.

Chris Wallace and James Liley

nn=100000
Z=abs(rbind(rmnorm(0.8*nn,varcov=diag(2)), rmnorm(0.15*nn,varcov=rbind(c(1,0),c(0,2^2))), rmnorm(0.05*nn,varcov=rbind(c(3^2,2),c(2,4^2))))); weights=runif(nn)
yy=fit.3g(Z,pars=c(0.7,0.2,2.5,1.5,3,1),weights=weights,incl_z=TRUE)
yy$pars
plot(yy,rlim=2)

jamesliley/subtest documentation built on May 18, 2019, 11:21 a.m.

jamesliley/subtest index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

jamesliley/subtest
Test whether two putative subgroups of a disease phenotype show evidence of differential genetic basis.

fit.3g: fit.3g
In jamesliley/subtest: Test whether two putative subgroups of a disease phenotype show evidence of differential genetic basis.

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Related to fit.3g in jamesliley/subtest...

R Package Documentation

Browse R Packages

We want your feedback!

jamesliley/subtest Test whether two putative subgroups of a disease phenotype show evidence of differential genetic basis.

fit.3g: fit.3g In jamesliley/subtest: Test whether two putative subgroups of a disease phenotype show evidence of differential genetic basis.

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Related to fit.3g in jamesliley/subtest...

R Package Documentation

Browse R Packages

We want your feedback!

jamesliley/subtest
Test whether two putative subgroups of a disease phenotype show evidence of differential genetic basis.

fit.3g: fit.3g
In jamesliley/subtest: Test whether two putative subgroups of a disease phenotype show evidence of differential genetic basis.