suppressPackageStartupMessages(library("dplyr"))
suppressPackageStartupMessages(library("BiocStyle"))
suppressPackageStartupMessages(library("org.Hs.eg.db"))
suppressPackageStartupMessages(library("GO.db"))

Introduction

The HPA project

From the Human Protein Atlas [@Uhlen2005; @Uhlen2010] site:

The Swedish Human Protein Atlas project, funded by the Knut and Alice Wallenberg Foundation, has been set up to allow for a systematic exploration of the human proteome using Antibody-Based Proteomics. This is accomplished by combining high-throughput generation of affinity-purified antibodies with protein profiling in a multitude of tissues and cells assembled in tissue microarrays. Confocal microscopy analysis using human cell lines is performed for more detailed protein localisation. The program hosts the Human Protein Atlas portal with expression profiles of human proteins in tissues and cells.

The r Biocpkg("hpar") package provides access to HPA data from the R interface. It also distributes the following data sets:

Several flat files are distributed by the HPA project and available within the package as data.frames, other datasets are available through a search query on the HPA website. The description below is taken from the HPA site:

The hpar::allHparData() returns a list of all datasets (see below).

HPA data usage policy

The use of data and images from the HPA in publications and presentations is permitted provided that the following conditions are met:

Installation

r Biocpkg("hpar") is available through the Bioconductor project. Details about the package and the installation procedure can be found on its landing page. To install using the dedicated Bioconductor infrastructure, run :

## install BiocManager only one
install.packages("BiocManager")
## install hpar
BiocManager::install("hpar")

After installation, r Biocpkg("hpar") will have to be explicitly loaded with

library("hpar")

so that all the package's functionality and data is available to the user.

The r Biocpkg("hpar") package

Data sets

A table descibing all dataset available in the package can be accessed with the allHparData() function.

hpa_data <- allHparData()
DT::datatable(hpa_data)

The Title variable corresponds to names of the data that can be downloaded localled and cached as part of the ExperimentHub infrastructure.

head(normtissue <- hpaNormalTissue())

Note that given that the hpa data is distributed as par the ExperimentHub infrastructure, it is also possible to query it directly for relevant datasets.

library("ExperimentHub")
eh <- ExperimentHub()
query(eh, "hpar")

HPA interface

Each data described above is a data.frame and can be easily manipulated using standard R or BiocStyle::CRANpkg("tidyverse") tidyverse functionality.

names(normtissue)
## Number of genes
length(unique(normtissue$Gene))
## Number of cell types
length(unique(normtissue$Cell.type))
## Number of tissues
length(unique(normtissue$Tissue))

## Number of genes highlighly and reliably expressed in each cell type
## in each tissue.
library("dplyr")
normtissue |>
    filter(Reliability == "Approved",
           Level == "High") |>
    count(Cell.type, Tissue) |>
    arrange(desc(n)) |>
    head()

We will illustrate additional datasets using the TSPAN6 (tetraspanin 6) gene (ENSG00000000003) as example.

id <- "ENSG00000000003"
subcell <- hpaSubcellularLoc()
rna <- rnaGeneCellLine()

## Compine protein immunohistochemisty data, with the subcellular
## location and RNA expression levels.
filter(normtissue, Gene == id) |>
    full_join(filter(subcell, Gene == id)) |>
    full_join(filter(rna, Gene == id)) |>
    head()

It is also possible to directly open the HPA page for a specific gene (see figure below).

browseHPA(id)

The HPA web page for the tetraspanin 6 gene (ENSG00000000003).

HPA release information

Information about the HPA release used to build the installed r Biocpkg("hpar") package can be accessed with getHpaVersion, getHpaDate and getHpaEnsembl. Full release details can be found on the HPA release history page.

getHpaVersion()
getHpaDate()
getHpaEnsembl()

A small use case

Let's compare the subcellular localisation annotation obtained from the HPA subcellular location data set and the information available in the Bioconductor annotation packages.

id <- "ENSG00000001460"
filter(subcell, Gene == id)

Below, we first extract all cellular component GO terms available for id from the r Biocannopkg("org.Hs.eg.db") human annotation and then retrieve their term definitions using the r Biocannopkg("GO.db") database.

library("org.Hs.eg.db")
library("GO.db")
ans <- AnnotationDbi::select(org.Hs.eg.db, keys = id,
                             columns = c("ENSEMBL", "GO", "ONTOLOGY"),
                             keytype = "ENSEMBL")
ans <- ans[ans$ONTOLOGY == "CC", ]
ans
sapply(as.list(GOTERM[ans$GO]), slot, "Term")

Session information {-}

sessionInfo()


lgatto/hpar documentation built on Dec. 5, 2022, 6:29 a.m.