View source: R/rna_seq_analysis.R
rna_seq_analysis | R Documentation |
This function allows you to run a statistical analysis of an RNA-seq experiment
using Bioconductor package
edgeR.
Inputs are tables of summarized reads generated using
featureCounts (part of the lab's
RNA-seq analysis pipeline running on NYU's HPC) for each sample in the experiment.
The presence or absence of biological replicates is automatically inferred from the
conditionNames
argument: conditions aggregate samples as replicates. If
you provide as many different conditionNames
as there are sampleNames
each sample is taken as a single condition (no replicates).
The output includes tables of CPM, TPM and, if included, differential expression
(DE) analysis for selected pairs of samples (see "Value" section below for more
details).
Running without replicates: This function allows you to run the analysis on
experiments without biological replicate libraries. While the calculation of CPM and
TPM has no requirement for replicates, you should obviously avoid doing DE analysis
in the absence of replicates. The statistical model used by
edgeR
to perform DE analysis relies on biological replicates to estimate biological
variability. While there is no satisfactory alternative to actual biological replication,
you can still run a preliminary DE analysis without replicates. The approach
taken here relies on providing an estimate of biological variation that is reasonable
for RNA-seq experiments using S. cerevisiae. This is a better alternative
to simply assuming that biological variability is absent. Typical values for
the common BCV (square root-dispersion) for datasets arising from well-controlled
experiments are 0.1 for data on genetically identical model organisms, so that
value is used here.
rna_seq_analysis(pathToFiles, sampleNames, conditionNames, batchNames = NULL,
pairwiseDE = FALSE, outputFilePrefix)
pathToFiles |
A list of strings corresponding to the full path to the featureCounts output files for all samples in the experiment. No default. |
sampleNames |
A list of strings corresponding to the names of all samples
in the experiment in the same order as the files in |
conditionNames |
A list of strings corresponding to the experimental groups/conditions
for each sample/library. Will be the input to edgeR's |
batchNames |
Optional list of strings corresponding to the experimental batch for each
sample/library. Will be part of the design matrix for differential expression testing and
will define batch groups, if any. Defaults to |
pairwiseDE |
Logical indicating whether to test differential expression (DE).
Defaults to |
outputFilePrefix |
Optional string to be added as prefix to output file names. No default. |
The output includes several tables saved as .csv files in a directory named "RNA-seq_analysis" written to the working directory:
CPM: Compositional bias-normalized Counts Per
Million (output of edgeR's cpm
function). Useful
for intersample feature expression comparison (not feature length-normalized).
TPM: Compositional bias and feature length-normalized Transcripts Per Million. Useful for within-sample feature expression comparison.
DE: A Differential Expression table comparing the first sample
to all others. Includes log2 fold change (logFC
), average log2 counts per million
(logCPM
), two-sided p-value (PValue
) and false discovery rate (FDR
).
Note: Fold change differences (logFC
) between samples not directly ompared
in the table can be obtained by subtracting their reported logFC
(to the first sample).
For example, if the first sample is sample1
and we want the logFC
between
sample2
and sample3
, simply calculate the difference:
logFC.sample3_vs_sample2
= logFC.sample3
- logFC.sample2
A Multi-Dimensional Scaling plot of all samples is also saved in the output directory as a .pdf file.
## Not run:
rna_seq_analysis(pathToFiles = list('AH119-2h_featureCounts.txt', 'AH119-3h_featureCounts.txt',
'AH8104-2h_featureCounts.txt', 'AH8104-3h_featureCounts.txt'),
sampleNames = list('AH119_2h', 'AH119_3h', 'AH8104_2h', 'AH8104_3h'),
conditionNames = list('WT_2h', 'WT_3h', 'dot1_2h', 'dot1_3h'),
outputFilePrefix = 'dot1_noReplicates')
rna_seq_analysis(pathToFiles = list('AH119-2h_featureCounts.txt', 'AH119-3h_featureCounts.txt',
'AH8104-2h_featureCounts.txt', 'AH8104-3h_featureCounts.txt'),
sampleNames = list('AH119_2h', 'AH119_3h', 'AH8104_2h', 'AH8104_3h'),
conditionNames = list('WT_2h', 'WT_3h', 'dot1_2h', 'dot1_3h'),
pairwiseDE = TRUE, outputFilePrefix = 'dot1_noReplicates')
rna_seq_analysis(pathToFiles = list('AH119-A_featureCounts.txt', 'AH119-B_featureCounts.txt',
'AH8104-A_featureCounts.txt', 'AH8104-B_featureCounts.txt'),
sampleNames = list('AH119_A', 'AH119_B', 'AH8104_A', 'AH8104_B'),
conditionNames = list('WT', 'WT', 'dot1', 'dot1'),
batchNames = list('batch1', 'batch2', 'batch1', 'batch2')
pairwiseDE = TRUE)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.