add_predictions: Add predicted phenotypes to a recount rse object

Description Usage Arguments Details Value Author(s) References Examples

View source: R/add_predictions.R

Description

Shannon Ellis et al (2017) predicted phenotypes based on expression data for the samples in the recount2 project. Using this function you can add the predictions to a RangedSummarizedExperiment-class object to the colData() slot.

Usage

1
add_predictions(rse, is_tcga = FALSE, version = "latest", verbose = TRUE)

Arguments

rse

A RangedSummarizedExperiment-class object as downloaded with download_study. If this argument is not specified, the function will return the full predictions table.

is_tcga

Set to TRUE only when rse is from TCGA. Otherwise set to FALSE (default).

version

The version number for the predicted phenotypes data. It has to match one of the available numbers at https://github.com/leekgroup/recount-website/blob/master/predictions/. Feel free to check if there is a newer version than the default. The version used is printed as part of the file name.

verbose

If TRUE it will print a message of where the predictions file is being downloaded to.

Details

If you use these predicted phenotypes please cite the Ellis et al bioRxiv pre-print available at https://www.biorxiv.org/content/early/2017/06/03/145656. See citation details with citation('recount').

Value

A RangedSummarizedExperiment-class object with the prediction columns appended to the colData() slot. The predicted phenotypes are:

sex

male or female,

samplesource

cell_line or tissue,

tissue

tissue predicted based off of 30 tissues in GTEx,

sequencingstrategy

single or paired end sequencing.

For each of the predicted phenotypes there are several columns as described next:

reported_phenotype

NA when not available,

predicted_phenotype

NA when we did not predict, "Unassigned" when prediction was ambiguous,

accuracy_phenotype

accuracy is assigned per dataset based on comparison to samples for which we had reported phenotype information so there are three distinct values per predictor (GTEx, TCGA, SRA) across all studies.

Author(s)

Leonardo Collado-Torres

References

Ellis et al, bioRxiv, 2017. https://www.biorxiv.org/content/early/2017/06/03/145656

Examples

1
2
3
4
5
6
7
8
## Add the predictions to an example rse_gene object
rse_gene <- add_predictions(rse_gene_SRP009615)

## Explore the predictions
colData(rse_gene)

## Download all the latest predictions
PredictedPhenotypes <- add_predictions()

recount documentation built on Dec. 20, 2020, 2:01 a.m.