View source: R/differential_expression.R
differential_expression | R Documentation |
The purpose of differential_expression
is to compare the raw read counts of gene expression data
between different groups of samples to see if there is differential gene expression. This is different from model_gene_expression in that this one
tries to incorporate the changes needed for the LCCC-Bioinformatics group to use. Dumping sam and edger here. Letting
old Deseq (ie not deseq2) go for a bit until it's needed.
differential_expression(
my_dt = NULL,
analysis_method = "DESeq2",
base_file_name = NULL,
base_title = NULL,
core_number = round(parallel::detectCores()/2),
deseq2_results_cooksCutoff = NULL,
deseq2_results_independentFiltering = TRUE,
gene_expression_cols = NULL,
gmt_file_log2fc_cutoffs = c(0),
gmt_file_fdr_cutoffs = c(0.2, 0.05),
gmt_file_pvalue_cutoffs = c(0.05),
gmt_ref = "Gene signature created from custom analysis.",
imported_annotation = NULL,
low_gene_count_cutoff = NULL,
low_gene_count_method = max,
my_grouping = NULL,
output_dir = NULL,
patient_key_col = NULL,
p_adjust_method = "BH",
sample_key_col = "Run_ID",
up_suffix = "__Up",
down_suffix = "__Down"
)
my_dt |
Data table ( or data frame ) with gene counts as columns and samples as rows, incuding a grouping column with at least two groups and an id column ( key specified as sample_key_col parameter ) |
analysis_method |
Eventually a string option out of the following choices: DESeq2, DESeq, SAM, or edgeR indicating which method should be used to do the analysis. Currently only DESeq2 is supported. |
base_file_name |
String to specify the file name. |
core_number |
Integer to indicate the number of cores that should be used. |
deseq2_results_cooksCutoff |
Set to |
deseq2_results_independentFiltering |
Gets passed to |
gene_expression_cols |
Character vector with the names of the columns with genes in them. |
gmt_file_log2fc_cutoffs |
Numeric vector of cutoffs to use for the log2 fold change for genes included in signatures. Up and down signatures will be generated for each combination of log2fc, fdr and pvalue cutoffs. Defaults to c(0) which equates to no filtering by fold change. |
gmt_file_fdr_cutoffs |
Numeric vector of cutoffs to use for the FDR significant values. Two gene signatures will be made of all the genes that have under the fdr pValue: one for up genes and one for down. |
gmt_file_pvalue_cutoffs |
Numeric vector of cutoffs to use for the pValue significant values. Two gene signatures will be made of all the genes that have under the pValue: one for up genes and one for down. |
gmt_ref |
String indicating what should go in the reference part of the gmt file. |
imported_annotation |
Character vector to include what steps were done to the data prior to this analysis. This module will add on to those steps. |
low_gene_count_cutoff |
Numeric value indicating if/where to remove genes with low counts. Null will remove no genes. |
low_gene_count_method |
The summary method to be used, applied to each gene column in my_dt ( function reference, not character ), used to determine genes to be removed as below low_gene_count_cutoff. Defaults to max ( i.e. if largest count for given gene is below cutoff, exclude it ). |
my_grouping |
This string is the name of the column you want to use to split the data into groups. |
output_dir |
Path to the output directory. This will be calculated automatically if left blank. |
patient_key_col |
String matching the name of the column that should be used to identify the unique patients. If included, pairwise sample comparison will be performed. |
p_adjust_method |
String method name to pass to |
sample_key_col |
String matching the name of the column that should be used to identify the unique sample identifiers. |
up_suffix |
String to append to the end of 'Up' signatures |
down_suffix |
String to append to the end of 'Down' signatures |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ differential_expression ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This function utilizes one of either DESeq
or DESeq2
methods. DESeq
is recommended for single cell data.
List containing several outputs contains:
volcano_pack - to be used by view_volcano_plot
;
gmt_id_path - path to a gmt file with genes represetned by ids (typically entrez);
gmt_symbol_path - path to a gmt file with genes represented by symbols (typically hgnc);
stats_path - path to stats of Gene_Name, Fold_Change, pValue, FDR_pValue
stats file
Fix to match by patient ids.
Setup design_var and contrasts
Can only match two samples at this point.
view_heatmap
view_volcano_plot
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.