reference_set_extended: Whole blood gene expression measurements for 7 genes across...

reference_set_extendedR Documentation

Whole blood gene expression measurements for 7 genes across 3,264 samples

Description

A data set containing gene expression measurements from whole blood for a signature of 19 genes. This signature comprises the 7 SRS genes defined by Davenport et al plus a further 12 genes identified by Cano-Gamez et al. from microarray and RNA-seq data using canonical correlation analysis.

Usage

reference_set_extended

Format

A data frame with 3264 rows and 19 variables:

ENSG00000144659

SLC25A38, cosine-scaled, batch corrected gene expression level

ENSG00000103423

DNAJA3, cosine-scaled, batch corrected gene expression level

ENSG00000135372

NAT10, cosine-scaled, batch corrected gene expression level

ENSG00000079134

THOC1, cosine-scaled, batch corrected gene expression level

ENSG00000135972

MRPS9, cosine-scaled, batch corrected gene expression level

ENSG00000087157

PGS1, cosine-scaled, batch corrected gene expression level

ENSG00000165006

UBAP1, cosine-scaled, batch corrected gene expression level

ENSG00000111667

USP5, cosine-scaled, batch corrected gene expression level

ENSG00000182670

TTC3, cosine-scaled, batch corrected gene expression level

ENSG00000097033

SH3GLB1, cosine-scaled, batch corrected gene expression level

ENSG00000165733

BMS1, cosine-scaled, batch corrected gene expression level

ENSG00000103264

FBXO31, cosine-scaled, batch corrected gene expression level

ENSG00000152219

ARL14EP, cosine-scaled, batch corrected gene expression level

ENSG00000100814

CCNB1IP1, cosine-scaled, batch corrected gene expression level

ENSG00000127334

DYRK2, cosine-scaled, batch corrected gene expression level

ENSG00000131355

ADGRE3, cosine-scaled, batch corrected gene expression level

ENSG00000137337

MDC1, cosine-scaled, batch corrected gene expression level

ENSG00000156414

TDRD9, cosine-scaled, batch corrected gene expression level

ENSG00000115085

ZAP70, cosine-scaled, batch corrected gene expression level

...

Details

This data set is formed of 1,609 samples from healthy individuals and 1,655 samples from sepsis patients.

Sepsis patients were recruited as a part of the Genomic Advances in Sepsis (GAinS) study in Oxford, UK. Of these, 676 were profiled using the Illumina HumanHT microarray, 864 using polyA-based RNA-sequencing, and 115 using qPCR.

Healthy individual data was collected from a number of publicly available sources. In particular, 991 OIllumina HumanHT microarray samples were obtained from the SHIP-TREND consortium, 518 Illumina HumanHT microarray samples were obatained from the DILGOM cohort (an extension of the FINRISK study), and 100 polyA-based RNA-sequencing samples were obtained from the dutch 500FG cohort

RNA-seq data was log-transformed and any relevant batch effects were removed using the combat algorithm. Finally, the 7 SRS signature genes were extracted from each cohort and the data was integrated together using the mutual nearest neighbout (mNN) algorithm.

The values reported in this data set were obtained after mNN alignment, and thus they represent Cosine-scale batch-corrected values.

The main use of this data set is to serve as a reference to which new input samples can be aligned before prediction of SRS group using random forest models.


jknightlab/SepstratifieR documentation built on March 19, 2022, 9:43 p.m.