ExpressionSet including expression data and phenotypic
information about the samples.
The expression data is saved in the
assayData slot of the
ExpressionSet. It is a gene-by-sample matrix, containing a subset of
data from an in vitro stimulation of bovine macrophages with different
mycobacterial strains. Column names are sample names, and row names are
Ensembl gene identifiers of the Bos taurus species. Each cell contains
the log2-transformed normalised expression level of each gene in each sample.
The phenotypic information is saved in the
phenoData slot of the
ExpressionSet. Row names are sample names and columns contain
descriptive information about each sample, including experimental factors(e.g.
Treatment, Timepoint, Animal).
Gene expression was measured in poly-A purified strand-specific RNA libraries using the RNA-Sequencing Illumina(R) HiSeq(R) 2000 platform as paired-end 2 x 90 nucleotide reads. Raw reads from pooled RNA libraries were first deconvoluted according to sample-specific nucleotide barcodes. Read pairs containing adapter sequence in either read mate were discarded, and similarly read pairs of low overall quality in either mate were also discarded. Paired-end reads from each filtered individual library were aligned to the Bos taurus reference genome (B. taurus UMD3.1.71 genome release) using the STAR aligner software. For each library, raw counts for each gene based on sense strand data were obtained using the featureCounts software from the Subread package. The featureCounts parameters were set to unambiguously assign uniquely aligned paired-end reads in a stranded manner to the exons of genes within the Bos taurus reference genome annotation (B. taurus UMD3.1.71 genome annotation). The gene count outputs were further processed using the edgeR Bioconductor package.
The gene expression quantitation pipeline within the edgeR package was customised to: (1) filter out all bovine rRNA genes; (2) filter out genes displaying expression levels below the minimally-set threshold of one count per million [CPM] in at least ten individual libraries (number of biological replicates); (3) calculate normalisation factors for each library using the trimmed mean of M-values method; (4) log2-transform CPM values based on the normalised library size.
To generate this test data subset, we extracted 100 genes from the original
dataset of 12,121 genes. All 7 genes associated with the GO term "GO:0034142"
(i.e. "toll-like receptor 4 signaling pathway") present in the original
full-size filtered-normalised dataset were kept, all 3 Ensembl
gene identifiers annotated to the gene symbol 'RPL36A', and finally another
random 90 random genes, making a total of 100 genes measured in 117 samples.
Samples include all 10 biological replicates collected at four different
data(targets). The TLR4 pathway was found in the full
dataset as the top-ranking biological pathway discriminating the different
mycobacterial infections (unpublished observations).
assayData is a matrix of expression levels for 100 genes (rows)
measured in 117 samples (columns).
rownames are Ensembl gene identifiers of the
Bos taurus species.
colnames are samples identifiers.
phenoData is a data frame with 117 samples and 7 descriptive fields
(e.g. experimental factors) in the columns listed below:
rownames are unique identifiers. Here, sample names.
File contains local filenames where the RNAseq counts were
Sample contains individual sample name.
Animal contains the unique identifier of the animal
corresponding to the biological replicate, stored as a factor.
Treatment contains the infection status of the sample,
stored as a factor (CN: Control, MB: M. bovis, TB:
Time contains the time of measurement in hours
post-infection,stored as a factor.
Group contains a combination of the Treatment and Time
factors above, stored as a factor itself.
Timepoint contains the time of measurement, stored as a
numeric value. This field is useful to use on the X-axis of expression
plots. See function
Publication in review process.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
# Load the data data(AlvMac) # Structure of the data str(AlvMac) # Dimensions (rows, columns) of the data dim(AlvMac) # Subset of first 5 features and 5 samples AlvMac[1:5, 1:5] # Phenotypic information pData(AlvMac) # Phenotypic information about factor "Group" AlvMac$Group # Conversion of a factor to a character vector as.character(AlvMac$Group) # Number of samples (rows) and annotations (columns) dim(pData(AlvMac))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.