Description Format Author(s) Source References Examples
Salmon-generated transcript-level abundance estimates
summarized to gene level using tximport
along with raw counts,
gene lengths, and clinical annotations for 275 human primary
colorectal tissue samples across three phenotypes, including tumor,
normal adjacent-to-tumor, and healthy, represented as a
SummarizedExperiment
. Abundance estimates derived from
single-end RNA-seq.
A SummarizedExperiment
object containing 3 assays
of matrices,
each 37,361 rows x 275 columns. Each row is a gene and each column is a
sample.
The SummarizedExperiment
object also includes a colData
S4Vectors::DFrame
object with 275 rows and 27 columns. Each row is a
sample and each column is a field. The fields are described below.
dirName: name of directory into which raw data for sample was downloaded, serves as unique identifier
projId: NCBI BioProject identifier for projects registered in the BioProject database, or common name of projects listed in other databases
subId: subject identifier
sampId: sample identifier
sampType: sample type, which indicates the phenotype of the sample
dist_cm: relative distance in centimeters from tumor from which given sample was obtained, NA for healthy samples and tumor-adjacent samples without measurements, 0 for tumor samples
sex: reported sex of subject, NA for missing values
race: reported ancestry of subject, NA for missing values
tStage: stage of tumor associated with sample, NA for healthy samples and tumor or tumor-adjacent samples with missing values
ageAtDiagDays: age in days at time of diagnosis (for subjects with tumors) or biopsy collection (for healthy subjects), NA for missing values
daysToDeath: time in days from diagnosis to death for subjects with tumors, NA for survivors in TCGA data set and missing values in other data sets
sampSite: anatomic subsite, where right refers to cecum and ascending, transverse refers to transverse, left refers to descending and sigmoid, rectum refers to rectum, NA for missing values
wt_kg: subject weight in kilograms, NA for missing values
ht_cm: subject height in centimeters, NA for missing values
rnaMethod: method of enriching for mRNA during library preparation, either polyA for oligo(dT) selection or riboD for ribosomal depletion
rin: RNA integrity number for sample, NA for missing values
format: RNA sequencing read format, paired for paired-end, single for single-end
sequencer: identifier of instrument used for sequencing, taken from FASTQ header, NA for missing values
platform: name of Illumina instrument model used for sequencing
study: name assigned to data set for purpose of identifying data source
percDup: duplication level of reads on a single-end basis as measured by FastQC, presented as a percentage of total single-end reads per individual FASTQ file
percGc: GC content as a percentage of all nucleotides sequenced as measured by FastQC
seqLen: length in nucleotides of reads (for single-end) or fragments (for paired-end) for a given sample
rdProc: number of reads processed by Salmon, where processed means an attempt at quasi-mapping was performed
rdMap: number of reads quasi-mapped to the transcriptome by Salmon
percMap: reads quasi-mapped to the transcriptome as a percentage of all reads processed
data: abbreviated name of repository from which raw FASTQ files were downloaded, gdc means Genomic Data Commons, sradbg means Sequence Read Archive via dbGaP, srapub means Sequence Read Archive directly, bcuva means BarcUVa-Seq
Chris Dampier
See inst/scripts/make-data.R
for full details on generating this
dataset from source files.
Dampier, C.H., Devall, M., Jennelle, L.T., Diez-Obrero, V., Plummer, S.J., Moreno, V., Casey, G. Oncogenic Features in Histologically Normal Mucosa: Novel Insights Into Field Effect From a Mega-Analysis of Colorectal Transcriptomes. Clinical and Translational Gastroenterology. 2020 Jul; 11(7): e00210.
1 2 3 4 5 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.