assoc: Association analysis for HapAlleles In GHap: Genome-Wide Haplotyping

Description

Given a GHap.lmm object (as supplied by the ghap.lmm function), this function returns association statistics for HapAlleles.

Usage

 1 2 ghap.assoc(response, haplo, weights=NULL, gc = TRUE, only.active.alleles = TRUE, ncores = 1) 

Arguments

 response A vector of phenotypes. The vector must be named and all names must be present in the GHap.haplo object. weights A numeric vector with weights for phenotypes. These weights are treated as diagonal elements of the inverse weight matrix. If not supplied, the analysis is carried out assuming all observations are equally important. haplo A GHap.haplo object. only.active.alleles A logical value specifying whether calculations should be reported only for active haplotype alleles (default = TRUE). gc A logical value specifying whether genomic control should be performed (default = TRUE). Currently, this option does not take effect on HapBlocks. ncores A numeric value specifying the number of cores to be used in parallel computations (default = 1).

Details

This function uses least squares regression to test each HapAllele at a time for association with phenotypes. The fixed effect, error variance and test statistic of a given HapAllele are estimated as:

\hat{a}_i = (\mathbf{x}_i'\mathbf{x}_i)^{-1}\mathbf{x}_i'\mathbf{y}

VAR(\hat{a}_i) = (\mathbf{x}_i'\mathbf{x}_i)^{-1}\hat{σ}_e^2

t_i^2 = \frac{\hat{a}_i^2}{VAR(\hat{a}_i)}

Under the null hypothesis that the regression coefficient is zero t_i^2 \sim χ^2(ν = 1). This function supports repeated measures, and records are dully mapped against the vectors of HapGenotypes.

If the vector of responses comprises adjusted records (i.e., residuals) from a linear mixed model, the regression analysis approximates the model implemented in other GWAS tools. However, the user must be aware of two known caveats associated with this approach. First, by pre-adjusting records instead of estimating HapAllele effects based on generalized least squares equations we ignore covariance structure and therefore bias the estimates downwards (Svishcheva et al., 2012). Second, each HapAllele being tested was also potentially included in the kinship matrix in the mixed model analysis, such that the HapAllele is included twice in the model: as fixed and random effect. This problem is known as proximal contamination (Listgarten et al., 2012). In the first case, we can use genomic control to recover p-values to an unbiased scale (Devlin and Roeder, 1999; Amin et al., 2007). However, not much can be done regarding the estimates of the effects. As a general recommendation, if the user is only interested in p-values, the ghap.assoc analysis should be sufficient. When effect estimates are of interest, the user can include the candidate HapAllele as a fixed effect in the full model in ghap.lmm. For the second case, a leave-one-chromosome-out (LOCO analysis) procedure can mitigate proximal contamination (Yang et al., 2014).

Value

The function returns a data.frame with the following columns:

 BLOCK Block alias. CHR Chromosome name. BP1 Block start position. BP2 Block end position. ALLELE Haplotype allele identity. BETA BLUE for the fixed effect of the haplotype allele. SE Standard error for the fixed effect. FREQ Frequency of the haplotype allele. CHISQ.OBS Observed value for the test statistics. If gc = TRUE (default), these values are scaled by the inflation factor. Inflation is computed through regression of observed quantiles onto expected quantiles. In order to avoid overestimation, only HapAlleles with test statistics within three standard deviations from the mean are used to compute the inflation factor. CHISQ.EXP Expected values for the test statistics. logP log10(1/P) or -log10(P) for the fixed effect.

Author(s)

Yuri Tani Utsunomiya <[email protected]>

References

N. Amin et al. A Genomic Background Based Method for Association Analysis in Related Individuals. PLoS ONE. 2007. 2:e1274.

Y. Da. Multi-allelic haplotype model based on genetic partition for genomic prediction and variance component estimation using SNP markers. BMC Genet. 2015. 16:144.

B. Devlin and K. Roeder. Genomic control for association studies. Biometrics. 1999. 55:997-1004.

J. Listgarten et al. Improved linear mixed models for genome-wide association studies. Nat. Methods. 2012. 9:525-526.

G. R. Svishcheva et al. Rapid variance components-based method for whole-genome association analysis. Nat Genet. 2012. 44:1166-1170.

J. Yang et al. Advantages and pitfalls in the application of mixed-model association methods. Nat. Genet. 2014. 46: 100-106.

Examples

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 # #### DO NOT RUN IF NOT NECESSARY ### # # # Copy the example data in the current working directory # ghap.makefile() # # # Load data # phase <- ghap.loadphase("human.samples", "human.markers", "human.phase") # # # Subset data - markers with maf > 0.05 # maf <- ghap.maf(phase, ncores = 2) # markers <- phase$marker[maf > 0.05] # phase <- ghap.subsetphase(phase, unique(phase$id), markers) # # # Generate blocks of 5 markers sliding 5 markers at a time # blocks.mkr <- ghap.blockgen(phase, windowsize = 5, slide = 5, unit = "marker") # # # Generate matrix of haplotype genotypes # ghap.haplotyping(phase, blocks.mkr, batchsize = 100, ncores = 2, outfile = "human") # # # Load haplotype genotypes # haplo <- ghap.loadhaplo("human.hapsamples", "human.hapalleles", "human.hapgenotypes") # # # ### RUN ### # # Subset common haplotypes in Europeans # EUR.ids <- haplo$id[haplo$pop %in% c("TSI","CEU")] # haplo <- ghap.subsethaplo(haplo,EUR.ids,rep(TRUE,times=haplo$nalleles)) # hapstats <- ghap.hapstats(haplo, ncores = 2) # common <- hapstats$TYPE %in% c("REGULAR","MAJOR") & # hapstats$FREQ > 0.05 & # hapstats$FREQ < 0.95 # haplo <- ghap.subsethaplo(haplo,EUR.ids,common) # # #Compute relationship matrix # K <- ghap.kinship(haplo, batchsize = 100) # # # Quantitative trait with 50% heritability # # Unbalanced repeated measurements (0 to 30) # # Two major haplotypes accounting for 50% of the genetic variance # myseed <- 123456789 # set.seed(myseed) # major <- sample(which(haplo$allele.in == TRUE),size = 2) # g2 <- runif(n = 2, min = 0, max = 1) # g2 <- (g2/sum(g2))*0.5 # sim <- ghap.simpheno(haplo, kinship = K, h2 = 0.5, g2 = g2, nrep = 30, # balanced = FALSE, major = major, seed = myseed) # # #Fit model using REML # model <- ghap.lmm(fixed = phenotype ~ 1, random = ~ individual, # covmat = list(individual = K), data = sim$data) # # # ### RUN ### # # #HapAllele GWAS using GEBVs as response # pheno <- model$random$individual # gwas1 <- ghap.assoc(response = pheno, haplo = haplo, ncores = 4) # # #HapAllele GWAS using GEBVs as response # #Weight observations by number of repeated measurements # pheno <- model$random$individual # w <- table(sim$data$individual) # w <- w + mean(w) # w <- w[names(pheno)] # gwas2 <- ghap.assoc(response = pheno, haplo = haplo, ncores = 4, weights = w) # # #HapAllele GWAS using residuals as response # pheno <- model$residuals # names(pheno) <- sim$data$individual # gwas3 <- ghap.assoc(response = pheno, haplo = haplo, ncores = 4) # # #Plot results # plot(gwas1$BP1/1e+6,gwas1$logP,pch=20,col="darkgreen",ylim=c(0,20), # xlab="Position (in Mb)",ylab=expression(-log[10](p))) # points(gwas2$BP1/1e+6,gwas2$logP,pch=20,col="gray") # points(gwas3$BP1/1e+6,gwas3$logP,pch=20,col="blue") # abline(v=haplo$bp1[major]/1e+6,lty=3) # abline(h=-log10(0.05/nrow(gwas1)),lty=3) # legend("topleft",legend = c("GEBVs","weighted GEBVs","residuals"), # pch = 20,col=c("darkgreen","gray","blue")) 

GHap documentation built on May 29, 2017, 9:56 p.m.