gemma: GEMMA
In timflutre/rutilstimflutre: Timothee Flutre's personal R code

gemma

R Documentation

GEMMA

Description

See Zhou & Stephens (Nature Genetics, 2012), and Zhou et al (PLoS Genetics, 2013).

Usage

gemma(
  model = "ulmm",
  y,
  X,
  snp.coords,
  recode.genos = TRUE,
  alleles = NULL,
  maf = 0.01,
  K.c = NULL,
  W = NULL,
  weights = NULL,
  exe = Sys.which("gemma"),
  out.dir = getwd(),
  task.id = "gemma",
  verbose = 1,
  clean = "none",
  seed = 1859,
  burnin = 1000,
  nb.iters = 7000,
  thin = 10
)

Arguments

`model`	name of the model to fit (ulmm/bslmm)
`y`	vector of phenotypes with genotype names
`X`	matrix of bi-allelic SNP genotypes encoded, for each SNP, in number of copies of its second allele, i.e. as allele doses in 0,1,2, with genotypes in rows and SNPs in columns; the "second" allele is arbitrary, it corresponds to the second column of `alleles`, which can be the minor or the major allele
`snp.coords`	data.frame with SNPs as row names and two columns named "coord" and "chr"
`recode.genos`	if TRUE, SNP genotypes in X will be recoded so that the minor allele is counted
`alleles`	data.frame with SNPs in rows (names as row names) and alleles in columns (exactly 2 columns are required); the second column should correspond to the allele which number of copies is counted at each SNP in `X`; if NULL, fake alleles will be generated; if it is a matrix, it will be silently converted into a data.frame
`maf`	SNPs with minor allele frequency strictly below this threshold will be discarded
`K.c`	kinship matrix; if NULL, will be estimated using X via `estimGenRel` with `relationships="additive"` and `method="zhou"`
`W`	matrix of covariates with genotypes in rows (names as row names), a first column of 1 for the intercept and, if needed, a second column of covariates values; if NULL, a column of 1 will be used for the intercept
`weights`	vector of positive weights with genotype names
`exe`	path to the executable of GEMMA
`out.dir`	directory in which the output files will be saved
`task.id`	identifier of the task (used in temporary and output file names)
`verbose`	verbosity level (0/1)
`clean`	remove files: none, some (temporary only), all (temporary and results)
`seed`	seed for the pseudo-random number generator for GEMMA
`burnin`	number of iterations to discard as burn-in (if model="bslmm")
`nb.iters`	number of iterations (if model="bslmm")
`thin`	thining (if model="bslmm")

Value

list with the following components: recoded, cmd, log, global mean, pve, tests (and hyperparams and params for BSLMM)

Note

P. Carbonetto showed how the PVE of a single SNP can be obtained, assuming no covariate, x is centered and Cov(x,beta) is zero: pve <- var(x) * (beta^2 + se^2)/var(y)

Author(s)

Timothee Flutre [aut,cre], Dalel Ahmed [ctb]

Examples

## Not run: ## simulate genotypes
set.seed(1859)
I <- 200
P <- 2000
X <- simulGenosDose(nb.genos=I, nb.snps=P)

## make fake SNP coordinates and alleles
nb.snps.per.chr <- ncol(X) / 2
snp.coords <- data.frame(coord=rep(1:nb.snps.per.chr, 2),
                         chr=rep(c("chr1","chr2"), each=nb.snps.per.chr),
                         row.names=colnames(X),
                         stringsAsFactors=FALSE)
alleles <- data.frame(major=rep("A", ncol(X)),
                      minor="a",
                      row.names=colnames(X),
                      stringsAsFactors=FALSE)

## simulate phenotypes (only additive effects)
modelA <- simulBvsr(Q=1, X=X, pi=0.01, pve=0.7, sigma.a2=1)
summary(abs(modelA$a[modelA$gamma == 1]))

## test SNPs one by one with the univariate LMM
fit.u <- gemma(model="ulmm", y=modelA$Y[,1], X=X, snp.coords=snp.coords,
               alleles=alleles, W=modelA$W, out.dir=tempdir(), clean="all")
fit.u$global.mean
fit.u$pve
cor(modelA$a[modelA$gamma == 1], fit.u$tests$beta[modelA$gamma == 1])
cols <- rep("black",ncol(X)); cols[modelA$gamma==1] <- "red"
pvadj.AA <- qqplotPval(fit.u$tests$p_wald, col=cols, ctl.fdr.bh=TRUE,
                       plot.signif=TRUE)
t(binaryClassif(known.nulls=modelA$gamma == 0,
                called.nulls=pvadj.AA$pv.bh > 0.05))
qtl <- names(sort(modelA$a[modelA$gamma == 1], decreasing=TRUE))[1]
boxplotCandidateQtl(y=modelA$Y[,1], X=X, snp=qtl, show.points=TRUE, main=qtl)
slope <- fit.u$tests[qtl,"beta"] *
  ifelse(fit.u$recode[qtl], -1, 1)
abline(a=fit.u$global.mean["beta.hat"] - mean(X[,qtl]) * slope,
       b=slope, col="red")

## compute the PVE of a single SNP (assuming no covariate)
(pve.qtl <- var(X[,qtl]) * (fit.u$tests[qtl,"beta"]^2 + fit.u$tests[qtl,"se"]^2) / var(modelA$Y[,1]))

## same but per chrom
## fit.u2 <- gemmaUlmmPerChr(y=modelA$Y[,1], X=X, snp.coords=snp.coords,
##                           alleles=alleles, W=modelA$W,
##                           out.dir=tempdir(), clean="all")

## fit all SNPs jointly with the BSLMM
burnin <- 10^3
nb.iters <- 10^4
thin <- 10^2
fit.bs <- gemma(model="bslmm", y=modelA$Y[,1], X=X, snp.coords=snp.coords,
                alleles=alleles, W=modelA$W, out.dir=tempdir(), clean="all",
                burnin=burnin, nb.iters=nb.iters, thin=thin)
posterior.samples <- coda::mcmc(data=fit.bs$hyperparams, start=burnin + 1,
                                end=burnin + nb.iters, thin=thin)
summary(posterior.samples)

## simulate phenotypes (only dominant effects)
set.seed(1859)
modelD <- simulBvsr(Q=1, X=X, pi=0.01, pve=0.7, sigma.a2=0, sigma.d2=1)

## test SNPs as "additive" one by one with the univariate LMM
A.z <- estimGenRel(X=X, relationships="additive", method="zhou")
fit.u.DA <- gemma(model="ulmm", y=modelD$Y[,1], X=X,
                  snp.coords, alleles,
                  K.c=A.z, W=modelD$W, out.dir=tempdir(), clean="all")
cols <- rep("black",ncol(X)); cols[modelD$gamma==1] <- "red"
pvadj.DA <- qqplotPval(fit.u.DA$tests$p_wald, col=cols, ctl.fdr.bh=TRUE,
                       plot.signif=TRUE)
t(binaryClassif(known.nulls=modelD$gamma == 0,
                called.nulls=pvadj.DA$pv.bh > 0.05))

## test SNPs as "dominant" one by one with the univariate LMM
X.D <- recodeIntoDominant(X=X)
fit.u.DD <- gemma(model="ulmm", y=modelD$Y[,1], X=X.D,
                  snp.coords, alleles,
                  K.c=A.z, W=modelD$W, out.dir=tempdir(), clean="all")
cols <- rep("black",ncol(X)); cols[modelD$gamma==1] <- "red"
pvadj.DD <- qqplotPval(fit.u.DD$tests$p_wald, col=cols, ctl.fdr.bh=TRUE,
                       plot.signif=TRUE)
t(binaryClassif(known.nulls=modelD$gamma == 0,
                called.nulls=pvadj.DD$pv.bh > 0.05))

## End(Not run)

timflutre/rutilstimflutre documentation built on Feb. 12, 2025, 11:35 p.m.