knitr::opts_chunk$set(
    error = FALSE,
    warning = FALSE,
    message = FALSE,
    crop = NULL
)
BiocStyle::markdown()

Introduction

The r Githubpkg("kevinrue/xCellData") package provides a R / Bioconductor resource for obtaining and representing 489 cell type gene signatures from [@Aran2017].

This packages uses the r Githubpkg("kevinrue/unisets") Sets class to represent the collection of signatures. However, the data itself is distributed with the package as a GMT file, which may be parsed and imported by other packages (e.g. r Biocpkg("GSEABase") GeneSetCollection, r Githubpkg("Kayla-Morrell/GeneSet") tbl_geneset).

Data preprocessing

The script used to download and preprocess the data is distributed with the package. You can find it at the following location:

system.file(package = "xCellData", "scripts", "makeData.R")

Briefly, the script downloads "Additional file 3: The 489 cell type gene signatures. (XLSX 417 kb)" from the https://genomebiology.biomedcentral.com website and reformats the content of the published Microsoft Excel file into a GMT text file.

Workflow

Loading the data

We use the xCellData() function to parse the GMT file distributed with the package into a r Githubpkg("kevinrue/unisets") Sets object.

library(xCellData)
library(unisets)
xsig <- xCellData()
xsig

Using the data

The signatures may then be used for downstream analyses such as cell type annotation.

For instance, the Sets object can be split into a list of signatures, for use in functions such as lapply.

as.list(xsig)

One may also inspect the number of genes in each signature.

dat <- setLengths(xsig)
hist(
    dat, breaks = 100, xlim=c(0, max(dat)),
    main = "Distribution of signature sizes", xlab = "Number of genes"
)

Example of packages using r Githubpkg("kevinrue/xCellData") include:

Session information

sessionInfo()

References



kevinrue/xCellData documentation built on Feb. 2, 2020, 1:13 a.m.