In plger/iPSCpoweR: iPSCpoweR: transcriptomics-based evaluation of experimental designs for iPSC disease modeling

%\VignetteEngine{knitr::rmarkdown} %\VignetteIndexEntry{iPSCpoweR}

iPSCpoweR - Introduction

This package provides resources for exploring large transcriptomics datasets from human induced pluripotent stem cell (iPSC) lines, run large numbers of differential expression analyses across random groups of individuals, in order to approximate power and guide experimental design. This vignette will guide you through its main usage.

News

Due to changes in underlying packages and in the permutation scheme (to enable analyses with a higher number of individuals), the results now produced by the packages will not be exactly the same as those published in the paper. This does not change the conclusions, but the precise sensitivity/false positive estimates will be different, meaning that you should not compare new results directly with older ones (but rather re-generate everything using the same versions).

Some recent changes: - the pre-filtering rule used in the paper is now the default (the documentation was contradictory) - the permutation scheme now allows for larger numbers of individuals - a wrapper was added from the dream method (see ?dreamWrapper) - the getSensitivityMatrices function has been updated to use the ComplexHeatmap package if available, making more professional plots.

Installing the package

To install the package, download it and install it using the following R command:

install.packages("path/to/iPSCpoweR.tar.gz", repos=NULL)

Alternatively, if you have devtools installed you can install the package directly from the git repository using:

library(devtools)
install_git("https://github.com/plger/iPSCpoweR", build_vignettes=TRUE)

Building the vignette might take a couple of minutes. Once the package is installed, you can load it and access this vignette:

library(iPSCpoweR)
vignette("iPSCpoweR")

Data included in the package

The main dataset included in this package was released by the Human Induced Pluripotent Stem Cell Initiative (HipSci), quantified with Salmon v6.1, using FMD indexes and the Refseq transcript annotation. You can use the getSamplesInfo() function to have an overview of the samples, and data("hipsci_annotation") to access the samples' annotation.

The package's functions take care of loading and handling the data. If you wish to access the data for other purposes, see the getTxExpr and getGeneExpr functions. To aggregate technical replicates, see the aggByClone function.

Using the GSE79636 dataset

The data from Carcamo-Orive et al. (Cell Stem Cell 2016) is also included in the package to allow its usage in the permutation DEA analyses. To do so, you must first load the GSE79636 object, which contains both the expression and annotation matrices:

library(iPSCpoweR)
data("GSE79636")
summary(GSE79636)

Then you can pass this object to any of the DEA permutation functions through the res argument:

DEA.permutateIndividuals(nbIndividuals = 3, res=GSE79636)

Using a custom dataset

A custom dataset could be used in a similar fashion. You must simply pass to the res argument a list containing two slots:

dat : the expression matrix, with samples as columns and features as row names.
annotation : the annotation matrix, with each row corresponding to a column in dat, and with at least the columns sex and individual.

Permutation DEA analysis across the HipSci open dataset

The package runs differential expression analyses (DEA) on random groups of samples using two functions, related to types of grouping:

Permutation schemes

By default, the permutation functions are multithreaded, using detetedCores()-1 threads. This can be adjusted through the ncores arguments, and multithreading can be disabled using ncores=1.

DEA.permutateIndividuals

The DEA.permutateIndividuals function constitutes, for each permutation, random groups of sex-balanced individuals. By default, the function will run up to 300 DEA (depending on how many different permutations are possible), each on random but sex-balanced groups of 2 individuals, using 2 iPSC clones per individual.

The (maximum) number of permutations can be changed with the maxTests argument; the number of individual per group can be changed using the nbIndividuals argument, and the number of clones (either 1 or 2) can be changed using the nbClone argument.

The default function does not add true/known differential expression (necessary for estimates of sensitivity), and the results can accordingly only be used for specificity analysis. To include differential expression and assess the proportion of it detected, set addDE=TRUE (see ?DEA.permutateIndividuals for more details). We'll run this example single-threaded (ncores=1):

DEA.permutateIndividuals(nbIndividuals = 3, maxTests = 10, nbClone = 2, addDE=TRUE, ncores = 1)

By default, the results (both a R object and summary figures) will be saved in the current working directory, although this behavior can be changed by setting doSave=FALSE, returnResults=TRUE.

A number of other parameters can be altered; see ?DEA.permutateIndividuals for more information.

DEA.permutateClones

The DEA.permutateClones function selects random individuals for each permutation, and puts one iPSC clone of each individual in each of two groups, so that each group contains different clones from the same set of individuals. By default, the function will run up to 300 DEA (can be changed with the maxTests argument), using groups based on 2 individuals (can be changed using the nbIndividuals argument).

The output, as well as most of the options (including the addition of differential expression) are as in the DEA.permutateIndividuals function. The function however has an additional feature, controlled by the paired argument: if set to TRUE, the differential expression will not be performed using the classical method (edgeR with exact test), but using GLM in order to pair the clones of the same individuals, i.e. using the model ~individual+group. By default, pairing is disabled.

For more information, see ?DEA.permutateClones.

Output files

if doSave=TRUE (default), 3 or 4 output files are produced, each with either of the following prefix:

[x]indiv.vs.[x]indiv.[y] for results of the DEA.permutateIndividuals function, where x is the number of individuals per group and y the number of clones (e.g. 2indiv.vs.2indiv.1).
clones.[x]indiv.[paired] for results of the DEA.permutateClones function, where x is the number of individuals and [paired] indicates if a paired analysis was performed.

The output files are:

*.RData : a R object containing all the results as well as the call.
*Nsig.svg : a histogram of the number of spurious differentially-expressed genes for each permutation.
*logFC.svg : a histogram of the foldchange of the spurious differentially-expressed genes.
*sensitivity.svg : (only if addDE=TRUE) a heatmap of the sensitivity at different foldchanges and expression levels, as produced by the getSensitivityMatrix() function.

Extracting true and false positives from the results

The simplest way of extracting the distributions of true positives (TP) and false positives (FP) from the results of permutation DEA is through the readPermResults function:

pm <- readPermResults("3indiv.vs.3indiv.2.RData")

The function returns a list, with each element being the results of one set of permuations. In other words, multiple filenames could be passed to the readPermResults function, but right now we have only one:

lapply(pm[[1]],FUN=head)

We can get the mean number of false positives, as well as (assuming that addDE was enabled) true positives and FDR:

cbind(
  FP=sapply(pm, FUN=function(x) mean(x$FP)),
  TP=sapply(pm, FUN=function(x) mean(x$TP)),
  FDR=sapply(pm, FUN=function(x) mean(x$FP/(x$FP+x$TP)))
)

The object(s) in this list can also be used to plot sensitivity matrices using getSensitivityMatrices:

getSensitivityMatrices(pm)

Plotting a ROC curve

The packages includes a function to plot a Receiver Operating Characteristic (ROC) curve representing the results of a permutation DEA analysis (assuming that addDE was enabled). The function must not be called on the results of the readPermResults function, which do not include individual p-values, but directly on the results of the permuation analysis, e.g.:

getPermROC("3indiv.vs.3indiv.2.RData")

By default, the function will plot the median sensitivity and specificity at sliding p-values (the line and points), as well as the 0.05 and 0.95 quantiles across the different permutations (the shaded area). You can disable the shaded area with qprobs=NA. See ?getPermROC for more options.

Note of caution

Because of multiple testing correction, the significance of a statistical test performed on one gene depends on that of other genes. This makes the kind of power analysis that iPSCpoweR does tricky, because the FDR/q-values obtained are influence by the magnitude of the spiked-in differential expression. For this reason, the sensitivity and specificity given by the simulations should be interpreted with caution. Depending on the magnitude of the changes in your particular case, your real power might be greater or lower than that given by simulations. The simulations results should therefore chiefly be used to compare experimental settings and analysis methods, and to get a rough idea of your power.

Using a different DEA method in the permutations

The permutation analysis calls the edgeRwrapper function to perform the DEA by default. This function can be replaced by a custom function, or by some of the other functions already implemented (see below), using the DEAfunc argument of either DEA.permutateIndividuals or DEA.permutateClones. For custom functions, the output should be a data.frame with genes as row.names (in the same order in which they were initially given), and the following columns: logFC, PValue, FDR. The best way to get started writing your own function is too look at edgeRwrapper or voomWrapper.

Using the duplicateCorrelation approach for multiple clones

To enable the use of limma::duplicateCorrelation (and treat individuals as random effects) in DEA.permutateIndividuals, make sure you set DEAfunc=voomWrapper and nested=TRUE. This feature is not available for edgeR and requires more than one clone per individual.

For example:

DEA.permutateIndividuals(nbIndividuals = 3, nbClone = 2, addDE = T, 
    filter = function(x) {
        sum(x > 10) > 2
    }, nested = T, DEAfunc = voomWrapper)

This approach was the best performing in our study.

Summing clones' read counts before DEA

You can use the function voomWrapperSumReps, which does the same thing as the normal voomWrapper function, except that it sums the read counts of clones of the same individual before running the analysis.

Using mixed models

Four functions are available to use mixed models(-like) approaches, considering the individual as a random effect:

vstLmerWrapper runs DESeq2's variance-stabilizing transformation (VST), fits the mixed model ~1+(1|individual)+group (assuming that it is called with nested=TRUE), and uses the drop1 test to assess statistical significance.
voomLmerWrapper performs similarly, except that it passes the counts through voom instead of DESeq2's VST,
glmmWrapper fits the mixed moded ~1+(1|individual)+group using MASS::glmmPLQ with a quasipoisson dispersion model.
dreamWrapper fits the linear mixed moded ~1+(1|individual)+group using variancePartition with observational weights.

All these methods should be used with the nested argument, e.g.:

DEA.permutateIndividuals( nbIndividuals = 3, 
    nbClone = 2,
    addDE = T, 
    filter = function(x) { sum(x > 10) > 2 }, 
    nested = T, 
    DEAfunc = vstLmerWrapper )

If nested=F, the corresponding model without random effect will be used.

Of note, all four methods are very slow, and according to our study less accurate than using limma::duplicateCorrelation as discussed above.

Reproducing the analysis of variance

The analysis of the proportion of transcriptional variance explained by difference between individuals can be reproduced using the transcriptionalVarianceExplained function. By default, mixed models are used, treating the individual as a random effect variable. For more detail, see ?transcriptionalVarianceExplained.

To reproduce the analysis of variance in cellular morphology, see ?cellphenoVarianceExplained.

Reporting issues

Please report issues on the github repository.

plger/iPSCpoweR documentation built on Feb. 2, 2022, 1:37 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

plger/iPSCpoweR
iPSCpoweR: transcriptomics-based evaluation of experimental designs for iPSC disease modeling

In plger/iPSCpoweR: iPSCpoweR: transcriptomics-based evaluation of experimental designs for iPSC disease modeling

iPSCpoweR - Introduction

News

Installing the package

Data included in the package

Using the GSE79636 dataset

Using a custom dataset

Permutation DEA analysis across the HipSci open dataset

DEA.permutateIndividuals

DEA.permutateClones

Output files

Extracting true and false positives from the results

Plotting a ROC curve

Note of caution

Using a different DEA method in the permutations

Using the duplicateCorrelation approach for multiple clones

Summing clones' read counts before DEA

Using mixed models

Reproducing the analysis of variance

Reporting issues

R Package Documentation

Browse R Packages

We want your feedback!

plger/iPSCpoweR iPSCpoweR: transcriptomics-based evaluation of experimental designs for iPSC disease modeling

In plger/iPSCpoweR: iPSCpoweR: transcriptomics-based evaluation of experimental designs for iPSC disease modeling

iPSCpoweR - Introduction

News

Installing the package

Data included in the package

Using the GSE79636 dataset

Using a custom dataset

Permutation DEA analysis across the HipSci open dataset

DEA.permutateIndividuals

DEA.permutateClones

Output files

Extracting true and false positives from the results

Plotting a ROC curve

Note of caution

Using a different DEA method in the permutations

Using the duplicateCorrelation approach for multiple clones

Summing clones' read counts before DEA

Using mixed models

Reproducing the analysis of variance

Reporting issues

R Package Documentation

Browse R Packages

We want your feedback!

plger/iPSCpoweR
iPSCpoweR: transcriptomics-based evaluation of experimental designs for iPSC disease modeling