View source: R/rnaseq_related.R
run_deseq_analysis | R Documentation |
DESeq2::DESeq()
to perform DE analysis.DESeq2 is a popular method to perform DE analysis for RNA-seq data. Pre and post DESeq run, however,
involves several data wrangling steps. For example, prior to run DESeq genes may need to be filtered out based on
number of reads mapped to genes across replicates. Similarly, post DESeq run user needs to set cutoffs (log2fc, pvalue and padj)
to define up
and down
regulated genes. For large RNA-seq experiments involving several DE comparison such as sub-setting
and different cutoffs creates messy and less readable code. This function helps to make things bit tidy and increase
the code readability. Besides that, output of this function can subsequently used for several other functions of this package.
run_deseq_analysis(
counts,
column_geneid,
sample_info,
group_numerator,
group_denominator,
delim = "\t",
comment_char = "#",
min_counts = 10,
min_replicates = 1,
cutoff_lfc = 1,
cutoff_pval = 0.05,
cutoff_padj = 0.01,
regul_based_upon = 1,
print_rows_all_zero = FALSE
)
counts |
a character string of the path to a count file or an object of dataframe having raw counts for each gene. See details below to know more about format of the count file and count dataframe. | |||||||||||||||||||||
column_geneid |
a character string denoting a column of geneid in | |||||||||||||||||||||
sample_info |
a character string denoting a name of sample information file or a dataframe. A file or a dataframe both must have at least two columns WITHOUT column names. First column denotes to samples names and second column denotes group name for each sample in first column. For e.g.
| |||||||||||||||||||||
group_numerator |
a character vector denoting sample groups to use in numerator to calculate fold change. | |||||||||||||||||||||
group_denominator |
a character vector denoting sample groups to use in denominator to calculate fold change. | |||||||||||||||||||||
delim |
a character denoting deliminator for | |||||||||||||||||||||
comment_char |
a character denoting comments line in count file. Only valid if | |||||||||||||||||||||
min_counts |
a numeric value, default 10, denoting minimum counts for a gene to be used to consider a gene for differential expression analysis. | |||||||||||||||||||||
min_replicates |
a numeric value, default 1, denoting minimum samples within a sample group must have | |||||||||||||||||||||
cutoff_lfc |
minimal threshold for log2fold change, default 1 (2 fold). | |||||||||||||||||||||
cutoff_pval |
minimal threshold for pvalue, default 0.05. P-value threshold will be applied only when
| |||||||||||||||||||||
cutoff_padj |
minimal threshold for Padj, default 0.01. Padj threshold will be applied only when
| |||||||||||||||||||||
regul_based_upon |
one of the numeric choices 1, 2, or 3. if 1 ...
if 2 ...
if 3 ...
| |||||||||||||||||||||
print_rows_all_zero |
logical, default FALSE, denoting whether to print genes with value 0 in all columns. |
: For the argument count
user can either provide a character string denoting a file or an object of dataframe.
In each case required format is explained below.
Count file
: Count file is a table of row read counts usually derived from .bam file for different genomic features
(e.g. genes, transcripts etc.). Data in the count file must be in a tabular format with a valid column deliminator
(e.g., tab, comma etc.). First row and first column will be considered as column names and row names respectively.
Values for column names and row names are usually character string or combination of character, numbers and
special characters such as _
, or .
. Both row names and column names must have unique values.
Count dataframe
: Count data in a dataframe format having same requirement of row names and column names explained for count file
.
Return object is a dataframe (tibble) having each row denoting a unique differential comparison. There are total 8 columns as explained below.
de_comparisons
: It stores the name of differential comparison for each row.
numerator
: It stores name of samples which were used as numerator for the differential comparison in each row.
denominator
: It stores name of samples which were used as denominator for the differential comparison in each row.
norm_counts
: This is a named-list
column. Each row in this column is a list of two containing normalised
genes expression values in a dataframe for the samples - numerator and denominator. The first column of the dataframe is gene_id
and subsequent columns are gene expression values in replicates of corresponding samples. This normalised gene expression values are
obtained using counts
slot of a DESeqDataSet object. e.g.: counts(dds,normalized=TRUE)
dsr
: This is a named-list
column stores an object of class DESeq2::DESeqResults()
for the
differential comparison in each row.
dsr_tibble
: This is a named-list
column stores and an output of DESeq2::DESeqResults()
in the dataframe format for the
differential comparison in each row.
dsr_tibble_deg
: The data in this column is same as in the column dsr_tibble
except it contains two extra columns signif
and regul
.
Values in the signif
specifies statistical and fold change significance of the gene while values in the regul
denotes whether the gene is up
or down
regulated.
deg_summmary
: This is a named-list
column. Each element of the list is a dataframe summarizing number of differential expressed gene for the differential
comparison for each row.
count_file <- system.file("extdata","toy_counts.txt" , package = "parcutils")
count_data <- readr::read_delim(count_file, delim = "\t",show_col_types = FALSE)
sample_info <- count_data %>% colnames() %>% .[-1] %>%
tibble::tibble(samples = . , groups = rep(c("control" ,"treatment1" , "treatment2"), each = 3) )
res <- run_deseq_analysis(counts = count_data ,
sample_info = sample_info,
column_geneid = "gene_id" ,
cutoff_lfc = 1 ,
cutoff_pval = 0.05,
group_numerator = c("treatment1", "treatment2") ,
group_denominator = c("control"))
res
## all comparisons
print(res$de_comparisons)
## DESEq result object(s)
print(res$dsr)
## DESEq result data frame
print(res$dsr_tibble)
## DESEq result data frame DEG assigned, look at the columns 'signif' and 'regul'
print(res$dsr_tibble_deg)
## DEG summary
print(res$deg_summmary)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.