R/example/methods.R

shiny::tabPanel("Methods",
  shiny::h3(shiny::strong("Methods")),
  "A current draft of the manuscript describing these results and detailing the methods can be found at an immaginary place.",
  " All cohorts included in the analysis are described individually below. We analyze 10 hematological phenotypes (platelet count, red blood cell count, hematocrit, hemoglobin, mean corpuscular volume, red cell distribution width, white blood cell count, monocyte count, neutrophil count, and lymphocyte count) across all cohorts.",
  
  shiny::hr(),
  shiny::h4(shiny::strong("PrediXcan method:")),
  "PrediXcan (26258848) is a gene-based association test that prioritizes genes which are likely to be causal for the phenotype. It implements an elastic net based method for selecting variants associated with gene expression in a given reference panel, and then uses those variants to predict gene expression in a cohort with only genotype data. We downloaded the PrediXcan software (see URLs) along with its prepackaged weights for gene expression data from PredictDB (see URLs). Weights for gene expression using RNA sequencing data were obtained from the Genotype-Tissue Expression project (23715323) (whole blood, genes= ; and EBV transformed lymphocytes, genes=; n=), Depression Genes and Networks (24092820) (whole blood, genes=11538, n=922), and Multi-Ethnic Study of Atherosclerosis (Europeans only, monocytes, genes=, n=) (23900078).  Imputed genotypes for all cohorts were filtered for imputation quality based on R2 > 0.3; variants not meeting this threshold were excluded from the analysis. We use DGN as our primary reference panel for all TWAS analyses as it is the largest single whole blood RNA-seq dataset.",
  
  shiny::hr(),
  shiny::h4(shiny::strong("Included cohorts:")),
  "These TWAS analyses were limited to self-reported white or European ancestry participants, for easy comparability with the DGN European ancestry eQTL panel, including input of LD information into the R Shiny application (see R Shiny Methods), and with the largest single-ancestry blood cell trait GWAS.",  
  
  shiny::h5(shiny::strong("Genetic Epidemiology Research on Adult Health and Aging (GERA)")),
  "The GERA cohort includes over 100,000 adults who are members of the Kaiser Permanente Medical Care Plan, Northern California Region (KPNC) and consented to research on the genetic and environmental factors that affect health and disease, linking together clinical data from electronic health records, survey data on demographic and behavioral factors, and environmental data with genetic data.  The GERA cohort was formed by including all self-reported racial and ethnic minority participants with saliva samples (19%); the remaining participants were drawn sequentially and randomly from non-Hispanic White participants (81%). Genotyping was completed as previously described (26092718) using 4 different custom Affymetrix Axiom arrays with ethnic-specific content to increase genomic coverage. Genotype data were imputed to 1000 Genomes Phase 3. Principal components analysis was used to characterize genetic structure in this European sample (26092716). Hematological measures were extracted from medical records. In individuals with multiple measurements, the first visit with complete white blood cell differential (if any) was used for each participant. Otherwise, the first visit was used. In total, 54,542 participants with hematological measures were included in the analysis.",
  
  shiny::h5(shiny::strong("Women's Health Initiative (WHI)")),
  "WHI originally enrolled 161,808 women aged 50-79 between 1993 and 1998 at 40 centers across the US, including both a clinical trial (including three trials for hormone therapy, dietary modification, and calcium/vitamin D) and an observational study arm (9492970). WHI recruited a socio-demographically diverse population representative of US women in this age range. Two WHI extension studies conducted additional follow-up on consenting women from 2005-2010 and 2010-2015. Genotyping was available on some WHI participants through the WHI SNP Health Association Resource (SHARe) resource, which used the Affymetrix 6.0 array (~906,600 SNPs, 946,000 copy number variation probes) and on other participants through the MEGA array (https://www.biorxiv.org/content/10.1101/188094v2). Imputation and association analysis was performed separately in individuals with Affymetrix only, MEGA only, and both Affymetrix and MEGA data. For variants with both Affymetrix and MEGA genotypes available, MEGA genotypes were used. In total, 18,100 self-reported white women with hematological phenotypes were included. Six sub-cohorts from the WHI study were included in the meta-analysis and phenotypes were not collected uniformly across the cohorts. Sample size information for each phenotype is contained in (Supplementary Table 5).",
  
  shiny::h5(shiny::strong("Atherosclerosis Risk in Communities (ARIC)")),
  "The ARIC study was initiated in 1987 and recruited participants age 45-64 years from 4 field centers (Forsyth County, NC; Jackson, MS; northwestern suburbs of Minneapolis, MN; Washington County, MD) in order to study cardiovascular disease and its risk factors (2646917), including the participants of self-reported European ancestry included here. Standardized physical examinations and interviewer-administered questionnaires were conducted at baseline (1987-89), three triennial follow-up examinations, a fifth examination in 2011-13, and a sixth exam in 2016-2017. Genotyping was performed through the CARe consortium Affymetrix 6.0 array (20400780). ARIC EA genotype data were imputed to Haplotype Reference Consortium (HRC). In total, 9,345 European ancestry participants with hematological phenotypes were included in the analysis. All phenotypes were adjusted for study site, age, age squared, sex, and top ten PCs and were inverse normalized.",
  
  shiny::h5(shiny::strong("BioMe")),
  "Details will be coming soon.",
  
  shiny::hr(),
  shiny::h4(shiny::strong("Conditional analysis:")),
  "For each statistically significant TWAS gene-trait association, the effect of predicted gene expression was conditioned on a set of previously reported GWAS sentinel variants from [32888494] meeting the following criteria: 1) the sentinel variant fell within a 1Mb region of the TWAS gene, 2) the trait with which the GWAS variant was associated matched the TWAS analytical trait or was within the same trait category as the analytical trait (platelets, red blood cell indices {hematocrit, hemoglobin, mean corpuscular volume, red blood cell count, red blood cell distribution width}, white blood cell indices {white blood cell count, neutrophils, monocytes, lymphocytes}), and 3) the GWAS variant met an imputation quality threshold of R2 > 0.3. We used a modified version of the cpgen R package (see cpgen Methods) to perform the conditional analysis, accounting for a PLINK KING-robust kinship matrix (20926424), which used only genotyped variants and excluded those with minor allele frequency less than 5% and those missing more than 1% of SNPs.",
  
  shiny::hr(),
  shiny::h4(shiny::strong("Meta-analysis and replication with ARIC, WHI, BioMe:")),
  "In order to replicate the conditionally significant gene-trait association, we tested each association via a meta-analysis of the ARIC, WHI, and BioMe cohorts. As described above, PrediXcan was used to facilitate gene expression imputation and association in each cohort separately, and the meta-analysis association test was conducted using METAL (20616382).",
  shiny::br(),
  shiny::br(),
  "Replication of the GERA significant gene-trait associations was performed using meta-analyzed TWAS results from ARIC, WHI, and BioMe. Nine gene-trait associations remained statistically significant after conditional analysis; for this set of genes, we defined a Bonferroni-corrected statistically significant replication threshold at p-value < 5.56 X 10-3. For the fine-mapping analysis, statistical significance of replicated genes was qualified based on two different thresholds -- a stringent threshold Bonferroni-corrected for all 239 statistically significant TWAS gene-trait associations at p-value < 2.09 X 10-4, and a more lenient threshold at p-value < 0.05.",
  
  shiny::hr(),
  shiny::h4(shiny::strong("FOCUS:")),
  "We used the Fine-mapping Of CaUsal gene Sets (FOCUS) (30926970) software to fine-map TWAS statistics at genomic risk regions. As input, we used GWAS summary data from GERA along with eQTL weights from PredictDB Depression Genes and Networks whole blood data, and an LD reference panel from 1000 Genomes  phase 3. The software outputs a credible set of genes at each locus which can be used to explain observed genomic risk.",
  
  shiny::hr(),
  shiny::h4(shiny::strong("Fine-mapping loci and locus categories:")),
  "Fine-mapping loci refers to fine-mapping analysis of trait-specific genomic locations that contain, and are centered at, sentinel TWAS genes. That is, we take the set of trait-specific statistically significant TWAS genes, select the most significant gene in the set (the sentinel gene), and assign it to a locus along with any other statistically significant TWAS genes within a 1Mb window of the sentinel gene. We then select the next most significant TWAS gene which has not yet been assigned to a locus and continue in this fashion until all statistically significant TWAS genes have been assigned to a locus.
          We define locus categories based on whether the locus contains a single gene or multiple genes and whether the locus replicates in TWAS meta-analysis at either a lenient or strict threshold. Thus, locus categories are defined as follows: 1=single gene locus, strict replication (p < 2.09E-04); 2=single gene locus, replication (p < 0.05); 3=single gene locus, no replication; 4=multi gene locus, strict replication (p < 2.09E-04); 5=multi gene locus, replication (p < 0.05); 6=multi gene locus, no replication.",
  
  shiny::hr(),
  shiny::h4(shiny::strong("R Shiny:")),
  "We use R's convenient Shiny package to produce the web application which displays our GERA TWAS results. The IdeogramTrack (https://rdrr.io/bioc/Gviz/man/IdeogramTrack-class.html) uses Genome Reference Consortium Human Build 37 (GRCh37) and UCSC cytogentic bands from http://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/. All GERA TWAS results were produced using PrediXcan as described above, all GERA GWAS results were produced using XXX, and GERA conditional analysis results were produced using cpgen as described below. Known GWAS sentinel variants were obtained from PMID: 32888494. Model weights and model variants were taken from our primary DGN reference panel from PredictDB (or secondary reference panels GWB, GTL, or MSA from PredictDB). Correlation of predicted expression among genes at the locus was calculated using R's cor() function, and LD among variants was computed using plink --r2 (https://zzz.bwh.harvard.edu/plink/ld.shtml). We used ggplot2() to produce all figures, except the network visualization used visNetwork(). Tables were produced using the DT package (https://www.rdocumentation.org/packages/DT/versions/0.16).",
  
  shiny::hr(),
  shiny::h4(shiny::strong("cpgen:")),
  "We used the R package cpgen to perform conditional analysis of TWAS-significant genes, while accounting for a KING kinship matrix. However, cpgen is designed in such a way that it performs eigenvalue decomposition on the cohort sample for every function call. Since we had 239 TWAS-significant associations, this would have required eigenvalue decomposition on a sample of N ~ 55,000 for each of those 239 associations, a computationally burdensome calculation. Thus, we slightly modified the cpgen script. Specifically, we computed the eigenvalue decomposition on the GERA sample outside of the cpgen script (for each phenotype), and then subsequently loaded the appropriate eigenvectors and eigenvalues into the program, modifying the script so that it could take these eigenvectors and eigenvalues as input.
          ",
  shiny::br(),
  shiny::br(),
)
amanda-tapia/LocusXcanR documentation built on March 9, 2021, 7:36 p.m.