This package provides resources for exploring large transcriptomics datasets from human induced pluripotent stem cell (iPSC) lines, run large numbers of differential expression analyses across random groups of individuals, in order to approximate power and guide experimental design. This vignette will guide you through its main usage.
Due to changes in underlying packages and in the permutation scheme (to enable analyses with a higher number of individuals), the results now produced by the packages will not be exactly the same as those published in the paper. This does not change the conclusions, but the precise sensitivity/false positive estimates will be different, meaning that you should not compare new results directly with older ones (but rather re-generate everything using the same versions). The main results from the paper (using the HipSci dataset) are reproduced using up-to-date packages here.
Some recent changes:
- the pre-filtering rule used in the paper is now the default (the documentation was contradictory)
- the permutation scheme now allows for larger numbers of individuals
- a wrapper was added from the dream method (see ?dreamWrapper
)
- the getSensitivityMatrices
function has been updated to use the ComplexHeatmap
package if available, making more professional plots.
To install the package, download it and install it using the following R command:
BiocManager::install("plger/iPSCpoweR", dependencies=TRUE, build_vignettes=TRUE)
Building the vignette might take a couple of minutes. Once the package is installed, you can load it and access this vignette:
library(iPSCpoweR)
vignette("iPSCpoweR")
The main dataset included in this package was released by the Human Induced Pluripotent Stem Cell Initiative (HipSci), quantified with Salmon v6.1, using FMD indexes and the Refseq transcript annotation. You can use the getSamplesInfo()
function to have an overview of the samples, and data("hipsci_annotation")
to access the samples' annotation.
The package's functions take care of loading and handling the data. If you wish to access the data for other purposes, see the getTxExpr
and getGeneExpr
functions. To aggregate technical replicates, see the aggByClone
function.
By default, the HipSci data is used. The data from Carcamo-Orive et al. (Cell Stem Cell 2016) is also included in the package to allow its usage in the permutation DEA analyses. To do so, you must first load the GSE79636 object, which contains both the expression and annotation matrices:
data("GSE79636")
summary(GSE79636)
## Length Class Mode
## annotation 20 data.frame list
## dat 317 data.frame list
Then you can pass this object to any of the DEA permutation functions through the res
argument:
DEA.permutateIndividuals(nbIndividuals = 3, res=GSE79636)
A custom dataset could be used in a similar fashion. You must simply pass to the res
argument a list containing two slots:
dat
: the expression matrix, with samples as columns and features as row names.annotation
: the annotation matrix, with each row corresponding to a column in dat
, and with at least the columns sex
and individual
.The package runs differential expression analyses (DEA) on random groups of samples using two functions, related to types of grouping:
By default, the permutation functions are multithreaded, using detetedCores()-1
threads. This can be adjusted through the ncores
arguments, and multithreading can be disabled using ncores=1
.
The DEA.permutateIndividuals
function constitutes, for each permutation, random groups of sex-balanced individuals. By default, the function will run up to 50 DEA (depending on how many different permutations are possible), each on random but sex-balanced groups of 2 individuals, using 2 iPSC clones per individual.
The (maximum) number of permutations can be changed with the maxTests
argument; the number of individual per group can be changed using the nbIndividuals
argument, and the number of clones (either 1 or 2) can be changed using the nbClone
argument.
The default function does not add true/known differential expression (necessary for estimates of sensitivity), and the results can accordingly only be used for specificity analysis. To include differential expression and assess the proportion of it detected, set addDE=TRUE
(see ?DEA.permutateIndividuals
for more details). We'll run this example single-threaded (ncores=1
):
DEA.permutateIndividuals(nbIndividuals = 3, maxTests = 10, nbClone = 2, addDE=TRUE, ncores = 1)
## Aggregating transcript counts to gene-level...
## Running 10 differential expression analyses
## Results saved in 3indiv.vs.3indiv.2.RData
By default, the results (both a R object and summary figures) will be saved in the current working directory, although this behavior can be changed by setting doSave=FALSE, returnResults=TRUE
.
A number of other parameters can be altered; see ?DEA.permutateIndividuals
for more information.
The DEA.permutateClones
function selects random individuals for each permutation, and puts one iPSC clone of each individual in each of two groups, so that each group contains different clones from the same set of individuals. By default, the function will run up to 300 DEA (can be changed with the maxTests
argument), using groups based on 2 individuals (can be changed using the nbIndividuals
argument).
The output, as well as most of the options (including the addition of differential expression) are as in the DEA.permutateIndividuals
function. The function however has an additional feature, controlled by the paired
argument: if set to TRUE, the differential expression will not be performed using the classical method (edgeR with exact test), but using GLM in order to pair the clones of the same individuals, i.e. using the model ~individual+group
. By default, pairing is disabled.
For more information, see ?DEA.permutateClones
.
if doSave=TRUE
(default), 3 or 4 output files are produced, each with either of the following prefix:
[x]indiv.vs.[x]indiv.[y]
for results of the DEA.permutateIndividuals
function, where x is the number of individuals per group and y the number of clones (e.g. 2indiv.vs.2indiv.1).clones.[x]indiv.[paired]
for results of the DEA.permutateClones
function, where x is the number of individuals and [paired] indicates if a paired analysis was performed.The output files are:
spurious
differentially-expressed genes for each permutation.spurious
differentially-expressed genes.addDE=TRUE
) a heatmap of the sensitivity at different foldchanges and expression levels, as produced by the getSensitivityMatrix()
function.The simplest way of extracting the distributions of true positives (TP) and false positives (FP) from the results of permutation DEA is through the
readPermResults
function:
pm <- readPermResults("3indiv.vs.3indiv.2.RData")
The function returns a list, with each element being the results of one set of permuations. In other words, multiple filenames could be passed to the readPermResults
function, but right now we have only one:
lapply(pm[[1]],FUN=head)
## $nbComps
## [1] 10
##
## $FP
## V1 V2 V3 V4 V5 V6
## 31 286 126 37 41 38
##
## $TP
## V1 V2 V3 V4 V5 V6
## 51 55 47 50 42 48
##
## $DEGs
## FDR.below.05 absLog2FC logMeanCount
## SPDYE3 10 2.321928 2.744059
## CYP46A1 10 2.321928 2.722561
## LOC100101478 10 2.321928 3.816600
## RNF212 10 2.321928 3.864638
## C19orf40 10 2.321928 4.805020
## NEIL1 10 1.584963 4.816916
Each such object is itself a list containing:
We can for instance get the mean number of false positives, as well as (assuming that addDE
was enabled) true positives and FDR:
cbind(
FP=sapply(pm, FUN=function(x) mean(x$FP)),
TP=sapply(pm, FUN=function(x) mean(x$TP)),
FDR=sapply(pm, FUN=function(x) mean(x$FP/(x$FP+x$TP)))
)
The object(s) in this list can also be used to plot sensitivity matrices using getSensitivityMatrices
:
The packages includes a function to plot a Receiver Operating Characteristic (ROC) curve representing the results of a permutation DEA analysis (assuming that addDE
was enabled). The function must not be called on the results of the readPermResults
function, which do not include individual p-values, but directly on the results of the permuation analysis, e.g.:
getPermROC("3indiv.vs.3indiv.2.RData")
By default, the function will plot the median sensitivity and specificity at sliding p-values (the line and points), as well as the 0.05 and 0.95 quantiles across the different permutations (the shaded area). You can disable the shaded area with qprobs=NA
. See ?getPermROC
for more options.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.