In jinghuazhao/pQTLtools: A Protein Quantitative Trait Locus Toolkit

hg19 (build 37) reference data

Details of the reference data are shown as follows,

options(width=200)
rm(list=ls())
load("~/cambridge-ceu/turboman/turboman_hg19_reference_data.rda")
ls()
refgene_gene_coordinates_hg19_eur <- refgene_gene_coordinates_h19
head(ld_block_breaks_pickrell_hg19_eur)
dim(ld_block_breaks_pickrell_hg19_eur)
head(refgene_gene_coordinates_hg19_eur)
dim(refgene_gene_coordinates_hg19_eur)
save(ld_block_breaks_pickrell_hg19_eur,refgene_gene_coordinates_hg19_eur,file="turboman_hg19_reference_data.rda")

liftover

The script is as follows,

liftover <- function(chr_start_end_snpid)
# chain import currently only handles local, uncompressed file paths
# https://hgdownload.cse.ucsc.edu/goldenpath/hg19/liftOver/hg19ToHg38.over.chain.gz
# https://hgdownload.soe.ucsc.edu/goldenPath/hg38/liftOver/hg38ToHg19.over.chain.gz
{
  HPC_WORK <- Sys.getenv("HPC_WORK")
  f <- file.path(HPC_WORK,"bin","hg19ToHg38.over.chain")
  chain <- rtracklayer::import.chain(f)
  names(chain) <- gsub("chr","",names(chain))
  require(GenomicRanges)
  gr <- with(chr_start_end_snpid, GenomicRanges::GRanges(seqnames=chr,IRanges::IRanges(start,end),snpid=snpid))
  seqlevelsStyle(gr) <- "NCBI"
  gr38 <- rtracklayer::liftOver(gr, chain)
  if (all(is.na(gr38$seqnames))) {
    warning("Liftover failed for some SNPs.")
  }
  as.data.frame(gr38)
}

library(dplyr)
library(valr)
ld <- ld_block_breaks_pickrell_hg19_eur %>%
      dplyr::mutate(end=start,snpid=paste(chr,start,end,sep=":")) %>%
      dplyr::select(chr,start,end,snpid)
ld38 <- liftover(ld)
ld_block_breaks_pickrell_hg38_eur <- dplyr::left_join(ld, ld38, by="snpid") %>%
                                     dplyr::transmute(chr,start=start.y) %>%
                                     dplyr::filter(!is.na(start))
head(ld_block_breaks_pickrell_hg38_eur)
dim(ld_block_breaks_pickrell_hg38_eur)

refGene for hg38 (build 38)

refGene38 <- read.delim("~/tests/turboman/refGene38.tsv") %>%
             setNames(c("chrom","start","end","gene"))
refgene_gene_coordinates_hg38_eur <- valr::bed_merge(dplyr::group_by(refGene38,gene)) %>%
                                     valr::bed_sort() %>%
                                     setNames(c("chromosome", "gene_transcription_start", "gene_transcription_stop", "gene_name")) %>%
                                     dplyr::mutate(chromosome=gsub("chr","",chromosome),
                                                   gene_transcription_midposition=(gene_transcription_start+gene_transcription_stop)/2)
head(refgene_gene_coordinates_hg38_eur)
dim(refgene_gene_coordinates_hg38_eur)

hg38 reference data

LD blocks and refGene are saved into a .rda file.

save(ld_block_breaks_pickrell_hg38_eur,refgene_gene_coordinates_hg38_eur,file="turboman_hg38_reference_data.rda")
dir(pattern="*rda")

Remarks

The original reference data contains LD blocks as well refGene information and changes have been made so that 1. Only the boundaries of the LD blocks are lifted over from hg19 to hg38. 2. RefGene data would have been updated, so are replaced using external data. 3. For consistency, refgene_gene_coordinates_h19 is renamed as refgene_gene_coordinates_hg19_eur, which would be reflected in the function but a fuzzy prefix refgene_gene_coordinates_h would accommodate the original .rda. 4. The elongated list per gene is merged, such that

chr|start|end|gene ---|-----|---|----------- 1|66533360|66743095|SGIP1 1|66533592|66690375|SGIP1 1|66533592|66743095|SGIP1 1|66534152|66690375|SGIP1 1|66534152|66743095|SGIP1 1|66534152|66729362|SGIP1

becomes 1|66533360|66743095|SGIP1 as composed to 1|66999251|67216822| SGIP1 (hg19).

jinghuazhao/pQTLtools documentation built on May 18, 2024, 12:14 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com