An overview of pQTLdata

set.seed(0)
knitr::opts_chunk$set(
  out.extra = 'style="display:block; margin: auto"',
  fig.align = "center",
  fig.path = "pQTLdata/",
  collapse = TRUE,
  comment = "#>",
  dev = "png")
pkgs <- c("dplyr", "grid", "EnsDb.Hsapiens.v75", "ensembldb", "IRanges", "knitr", "org.Hs.eg.db", "S4Vectors", "VennDiagram")
for (p in pkgs) if (length(grep(paste("^package:", p, "$", sep=""), search())) == 0) {
    if (!requireNamespace(p)) warning(paste0("This vignette needs package `", p, "'; please install"))
}
invisible(suppressMessages(lapply(pkgs, require, character.only = TRUE)))

This package intends to gather information, meta-data and relevant scripts in proteogenomic analysis.

Collections

As used in several years of proteomic analysis and for future extensions, the collections are in two locations:

While library(help=pQTLdata) displays the general information, ? pQTLdata can give a list of data objects in the package.

Protein panels

Caprion

As has been the norm, no snapshot upon data release was provided which consequently requires substantial effort and the notable ones are highlighted here.

accession_list <- c("P04745", "Q9GZN8", "P0C0L5", "Q96N11", "P48960",
                    "Q9NUQ9", "P04062", "P69905", "P62805", "O14745",
                    "P23381", "P54577")
updated_list <- dplyr::filter(pQTLdata::caprion,Accession %in% accession_list) |>
                dplyr::ungroup() |>
                dplyr::select(Gene,Gene.orig,Protein,Accession,ensGenes,chr,start,end)
knitr::kable(updated_list,caption="Updated information on Caprion")

which again is useful for extracting data from GTEx v8.

Olink

This includes 12 qPCR panels, 15 Target 96 panels and Explore panels.

SomaScan

Both SomaScanV4.1 and the latest SomaScan11k are directly from SomaLogic.

SWATH-MS

This panel has been used in an experimental data acquisition and analysis.

Overlap of proteins

It is of interest to compare some of these,

suppressMessages(library(dplyr))
suppressMessages(library(VennDiagram))
uniprot <- list(Olink=pull(pQTLdata::Olink_Explore_HT,UniProt.ID),
                SomaScan=pull(pQTLdata::SomaScanV4.1,UniProt.ID),
                Caprion=pull(pQTLdata::caprion,Accession))
lapply(uniprot,head)
olink_somascan_caprion <- VennDiagram::venn.diagram(uniprot,filename = NULL,disable.logging = TRUE,
                                                    cex = 2.5, cat.cex = 2.5, cat.pos = c(0,0,180),
                                                    height=8,width=8,units="in")
grid.newpage()
grid.draw(olink_somascan_caprion)
suppressMessages(library(dplyr))
suppressMessages(library(VennDiagram))
uniprot <- list(Olink=pull(pQTLdata::Olink_Explore_HT,UniProt.ID),
                SomaScan=pull(pQTLdata::SomaScan11k,UniProt.ID),
                Caprion=pull(pQTLdata::caprion,Accession))
lapply(uniprot,head)
olink_somascan_caprion <- VennDiagram::venn.diagram(uniprot,filename = NULL,disable.logging = TRUE,
                                                    cex = 2.5, cat.cex = 2.5, cat.pos = c(0,0,180),
                                                    height=8,width=8,units="in")
grid.newpage()
grid.draw(olink_somascan_caprion)

Published data

This associates with several panels, including SomaScan (SomaScan160410)@sun18, Olink qPCR inflammation (inf1) @zhao23 and Seer (seer1980) @suhre24. In particular, inf1 contains updates from the original release by Olink.

knitr::kable(pQTLdata::inf1,caption="Olink/inflammation panel")

Reference data

We showcase EnsDb.Hsapiens.v75 from Bioconductor.

ensembldb::metadata(EnsDb.Hsapiens.v75)
ensembldb::keytypes(EnsDb.Hsapiens.v75)
exon_info <- ensembldb::exons(EnsDb.Hsapiens.v75)
gene_info <- ensembldb::genes(EnsDb.Hsapiens.v75)
transcript_info <- ensembldb::transcripts(EnsDb.Hsapiens.v75)
colnames(S4Vectors::mcols(gene_info))
colnames(S4Vectors::mcols(transcript_info))
overlaps <- IRanges::findOverlaps(transcript_info, gene_info)
overlapping_transcripts <- transcript_info[queryHits(overlaps)]
overlapping_genes <- gene_info[subjectHits(overlaps)]
overlap_data <- data.frame(
  transcript_id = mcols(overlapping_transcripts)$tx_id,
  gene_id = S4Vectors::mcols(overlapping_genes)$gene_id,
  gene_name = S4Vectors::mcols(overlapping_genes)$gene_name,
  start = pmax(start(overlapping_transcripts), start(overlapping_genes)),
  end = pmin(end(overlapping_transcripts), end(overlapping_genes))
)
gene_symbols <- c("BRCA1", "TP53")
gene_data <- subset(overlap_data,gene_name%in%gene_symbols,select=-c(gene_id,gene_name))
cols <- c("UNIPROTID","PROTEINID","GENEID","GENENAME","SEQNAME","TXID")
info <- ensembldb::select(EnsDb.Hsapiens.v75, keys = gene_symbols, 
                          columns = cols, keytype = "SYMBOL") |>
        dplyr::left_join(head(gene_data,15),by=c('TXID'='transcript_id')) |>
        subset(!is.na(UNIPROTID)&!is.na(start)&!is.na(end))
knitr::kable(head(info,15),caption="Annotation for BRCA1 and TP53")
keytypes(org.Hs.eg.db)
uniprot_ids <- ensembldb::select(org.Hs.eg.db, keys = gene_symbols, columns = "UNIPROT", keytype = "SYMBOL")

where org.Hs.eg.db is more focused on genes.

Scripts

An analysis involving COVID-19 data is in Olink/ directory, while the scripts/ directory records data generation which potentially can be extended. Specifically, docs.sh operates with GitHub while cran.sh builds, installs, and checks for compliance with the Comprehensive R Archive Network (CRAN).

URLs

References {-}

The EndNote/ directory includes references in @sun18 and @suhre20 formatted in EndNote.



Try the pQTLdata package in your browser

Any scripts or data that you put into this service are public.

pQTLdata documentation built on April 12, 2025, 1:34 a.m.