differential_expression: Does a differential gene expression analysis.

View source: R/differential_expression.R

differential_expressionR Documentation

Does a differential gene expression analysis.

Description

The purpose of differential_expression is to compare the raw read counts of gene expression data between different groups of samples to see if there is differential gene expression. This is different from model_gene_expression in that this one tries to incorporate the changes needed for the LCCC-Bioinformatics group to use. Dumping sam and edger here. Letting old Deseq (ie not deseq2) go for a bit until it's needed.

Usage

differential_expression(
  my_dt = NULL,
  analysis_method = "DESeq2",
  base_file_name = NULL,
  base_title = NULL,
  core_number = round(parallel::detectCores()/2),
  deseq2_results_cooksCutoff = NULL,
  deseq2_results_independentFiltering = TRUE,
  gene_expression_cols = NULL,
  gmt_file_log2fc_cutoffs = c(0),
  gmt_file_fdr_cutoffs = c(0.2, 0.05),
  gmt_file_pvalue_cutoffs = c(0.05),
  gmt_ref = "Gene signature created from custom analysis.",
  imported_annotation = NULL,
  low_gene_count_cutoff = NULL,
  low_gene_count_method = max,
  my_grouping = NULL,
  output_dir = NULL,
  patient_key_col = NULL,
  p_adjust_method = "BH",
  sample_key_col = "Run_ID",
  up_suffix = "__Up",
  down_suffix = "__Down"
)

Arguments

my_dt

Data table ( or data frame ) with gene counts as columns and samples as rows, incuding a grouping column with at least two groups and an id column ( key specified as sample_key_col parameter )

analysis_method

Eventually a string option out of the following choices: DESeq2, DESeq, SAM, or edgeR indicating which method should be used to do the analysis. Currently only DESeq2 is supported.

base_file_name

String to specify the file name.

core_number

Integer to indicate the number of cores that should be used.

deseq2_results_cooksCutoff

Set to Inf or FALSE to disable the resetting of p-values to NA. Gets passed to DESeq2::results 'cooksCutoff' argument

deseq2_results_independentFiltering

Gets passed to DESeq2::results 'independentFiltering' argument

gene_expression_cols

Character vector with the names of the columns with genes in them.

gmt_file_log2fc_cutoffs

Numeric vector of cutoffs to use for the log2 fold change for genes included in signatures. Up and down signatures will be generated for each combination of log2fc, fdr and pvalue cutoffs. Defaults to c(0) which equates to no filtering by fold change.

gmt_file_fdr_cutoffs

Numeric vector of cutoffs to use for the FDR significant values. Two gene signatures will be made of all the genes that have under the fdr pValue: one for up genes and one for down.

gmt_file_pvalue_cutoffs

Numeric vector of cutoffs to use for the pValue significant values. Two gene signatures will be made of all the genes that have under the pValue: one for up genes and one for down.

gmt_ref

String indicating what should go in the reference part of the gmt file.

imported_annotation

Character vector to include what steps were done to the data prior to this analysis. This module will add on to those steps.

low_gene_count_cutoff

Numeric value indicating if/where to remove genes with low counts. Null will remove no genes.

low_gene_count_method

The summary method to be used, applied to each gene column in my_dt ( function reference, not character ), used to determine genes to be removed as below low_gene_count_cutoff. Defaults to max ( i.e. if largest count for given gene is below cutoff, exclude it ).

my_grouping

This string is the name of the column you want to use to split the data into groups.

output_dir

Path to the output directory. This will be calculated automatically if left blank.

patient_key_col

String matching the name of the column that should be used to identify the unique patients. If included, pairwise sample comparison will be performed.

p_adjust_method

String method name to pass to stats::p.adjust for FDR correction

sample_key_col

String matching the name of the column that should be used to identify the unique sample identifiers.

up_suffix

String to append to the end of 'Up' signatures

down_suffix

String to append to the end of 'Down' signatures

Details

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ differential_expression ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This function utilizes one of either DESeq or DESeq2 methods. DESeq is recommended for single cell data.

Value

List containing several outputs contains:

  1. volcano_pack - to be used by view_volcano_plot;

  2. gmt_id_path - path to a gmt file with genes represetned by ids (typically entrez);

  3. gmt_symbol_path - path to a gmt file with genes represented by symbols (typically hgnc);

  4. stats_path - path to stats of Gene_Name, Fold_Change, pValue, FDR_pValue

Writes

  • stats file

Todos

  • Fix to match by patient ids.

  • Setup design_var and contrasts

Limitations

  • Can only match two samples at this point.

See also

  • view_heatmap

  • view_volcano_plot


Benjamin-Vincent-Lab/binfotron documentation built on Oct. 1, 2024, 8:33 p.m.