module_ge: module_ge

View source: R/module_ge.R

module_geR Documentation

module_ge

Description

Analyses differential gene expression from RNA-seq raw counts and plots PCA, volcano plots and gene set enrichment analysis (GSEA) for the desired comparisons. This last analysis include also Gene Set Variation Analysis.

Usage

module_ge(
  counts,
  genes_id,
  metadata,
  response,
  design,
  colors = c("orange", "black"),
  ref_level,
  shrink = "apeglm",
  biomart,
  fold_change = 2,
  p.adj = 0.05,
  gmt,
  gsea_pvalue = 0.2,
  gsva_gmt = "hallmark",
  kcdf = "Gaussian",
  method = "gsva",
  row.names = TRUE,
  col.names = TRUE
)

Arguments

counts

Data frame that contains gene expression data as raw counts.

genes_id

Name of the column that contains gene identifiers. Should be one of the following:'entrez_gene_id', 'ensemblgene_id' or 'hgnc_symbol'.

metadata

Data frame that contains supporting variables to the data.

response

Unquoted name of the variable indicating the groups to analyse.

design

Variables in the design formula in the form of: 'Var1 + Var2 + ... Var_n'.

colors

Character vector indicating the colors of the different groups to compare. Default values are two: black and orange.

ref_level

Character vector where the first element is the column name where the reference level is located and a second element indicating the name of level to be used as a reference when calculating differential gene expression.

shrink

Name of the shrinkage method to apply: "apeglm", "ashr", "normal" or "none". Use none to skip shrinkage. Default value is "apeglm".

biomart

Data frame containing a biomaRt query with the following attributes: ensembl_gene_id, hgnc_symbol, entrezgene_id, transcript_length, refseq_mrna. In the case of mus musculus data, external_gene_name must be obtained and then change the column name for hgnc_symbol. Uploaded biomaRt queries in GEGVIC: 'ensembl_biomartGRCh37', ensembl_biomartGRCh38_p13' and 'ensembl_biomartGRCm38_p6', 'ensembl_biomartGRCm39'.

fold_change

An integer to define the fold change value to consider that a gene is differentially expressed.

p.adj

Numeric value to define the maximum adjusted p-value to consider that a gene is differentially expressed.

gmt

A data frame containg the gene sets to analyse using GSEA. This object should be obtained with the read.gmt function from the clusterProfiler package.

gsea_pvalue

Numeric value to define the adjusted pvalue cutoff during GSEA. Set to 0.2 by default.

gsva_gmt

Path to the gmt file that contain the gene sets of interest. By default the parameter is set to 'hallmark' which provides all HALLMARK gene sets from MSigDB (version 7.5.1).

kcdf

Character string denoting the kernel to use during the non-parametric estimation of the cumulative distribution function of expression levels across samples when method="gsva". By default, "Gaussian" since GEGVIC transforms raw counts using the vst transformation. Other options are 'Poisson' or 'none'.

method

Name of the method to perform Gene set variation analysis. The options are: 'gsva', 'ssgea' or 'zscore'. Default value is 'gsva'.

row.names

Logical value to determine if row-names are shown in the heatmap.

col.names

Logical value to determine if column-names are shown in the heatmap.

Value

Returns ggplot objects (containing PCA, Volcano plot and GSEA analyses) and a list of data frames containing the results data.

Examples

tables_module_ge <- module_ge(counts = sample_counts,
                              genes_id = 'ensembl_gene_id',
                              metadata = sample_metadata,
                              response = MSI_status,
                              design = 'MSI_status',
                              colors = c('orange', 'black'),
                              ref_level = c('MSI_status', 'MSS'),
                              shrink = 'apeglm',
                              biomart = ensembl_biomart_GRCh38_p13,
                              fold_change = 2,
                              p.adj = 0.05,
                              gmt = 'inst/extdata/c2.cp.reactome.v7.5.1.symbols.gmt',
                              gsea_pvalue = 0.2,
                              gsva_gmt = 'hallmark',
                              method = 'gsva',
                              kcdf = 'Gaussian',
                              row.names = TRUE,
                              col.names = TRUE)


oriolarques/GEGVIC documentation built on Oct. 30, 2024, 10:44 p.m.