VSTexprTCGA: VSTexprTCGA: Gene Expression in Human tumours

VSTexprTCGAR Documentation

VSTexprTCGA: Gene Expression in Human tumours

Description

This is data from a study of human tumours by The Cancer Genome Atlas (TCGA) Research Network. The five tumour types in this data set are Breast, Colon, Kidney, Lung, and Prostate. There are normalized gene expression values for 4000 genes from 1000 samples, 200 samples per tumour type.

Usage

VSTexprTCGA

Format

A data frame with 1000 observations (rows) and 4001 variables (columns).

Column name Data type Description Values
[,1] classes factor 5 different types of cancer (Breast...Prostate)
[,2:4001] ABCF1_23...LOC100271836_100271836 numeric Gene expression data (8.109048 - 21.8406)

Details

The data has been used in exercises for supervised learning in BIN315. The gene expression values were normalized using the varianceStabilizingTransformation function from the DESeq2 package.

Source

This data is a subset of data provided by the National Cancer Institute in the US (specifically RNA-seq data from The Cancer Genome Atlas Pan-Cancer analysis project). Data subsetting was first done by Torgeir Rhoden Hvidsten. Additionally, 4000 of 13946 genes were selected with the use of the splsda function from the mixOmics package.

References

The Cancer Genome Atlas Research Network., Weinstein, J., Collisson, E. et al. (2013) The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet, 45, 1113 – 1120.

Examples


# Summary of the first six variables
summary(VSTexprTCGA[, 1:6])

# Number of cases per tumour type
table(VSTexprTCGA$classes)


thoree/stat340 documentation built on June 30, 2024, 4:04 p.m.