aggregate_mapped_genes: Aggregate/expand a gene matrix by gene mappings

View source: R/aggregate_mapped_genes.R

aggregate_mapped_genesR Documentation

Aggregate/expand a gene matrix by gene mappings

Description

Aggregate/expand a gene matrix (gene_df) using a gene mapping data.frame (gene_map). Importantly, mappings can be performed across a variety of scenarios that can occur during within-species and between-species gene mapping:

  • 1 gene : 1 gene

  • many genes : 1 gene

  • 1 gene : many genes

  • many genes : many genes

For more details on how aggregation/expansion is performed, please see: many2many_rows.

Usage

aggregate_mapped_genes(
  gene_df,
  gene_map = NULL,
  input_col = "input_gene",
  output_col = "ortholog_gene",
  input_species = "human",
  output_species = input_species,
  method = c("gprofiler", "homologene", "babelgene"),
  agg_fun = "sum",
  agg_method = c("monocle3", "stats"),
  aggregate_orthologs = TRUE,
  transpose = FALSE,
  mthreshold = 1,
  target = "ENSG",
  numeric_ns = "",
  as_integers = FALSE,
  as_sparse = TRUE,
  as_DelayedArray = FALSE,
  dropNA = TRUE,
  sort_rows = FALSE,
  verbose = TRUE
)

Arguments

gene_df

Input matrix where row names are genes.

gene_map

A data.frame that maps the current gene names to new gene names. This function's behaviour will adapt to different situations as follows:

  • gene_map=<data.frame> :
    When a data.frame containing the gene key:value columns (specified by input_col and output_col, respectively) is provided, this will be used to perform aggregation/expansion.

  • gene_map=NULL and input_species!=output_species :
    A gene_map is automatically generated by map_orthologs to perform inter-species gene aggregation/expansion.

  • gene_map=NULL and input_species==output_species :
    A gene_map is automatically generated by map_genes to perform within-species gene gene symbol standardization and aggregation/expansion.

input_col

Column name within gene_map with gene names matching the row names of X.

output_col

Column name within gene_map with gene names that you wish you map the row names of X onto.

input_species

Name of the input species (e.g., "mouse","fly"). Use map_species to return a full list of available species.

output_species

Name of the output species (e.g. "human","chicken"). Use map_species to return a full list of available species.

method

R package to use for gene mapping:

  • "gprofiler" : Slower but more species and genes.

  • "homologene" : Faster but fewer species and genes.

  • "babelgene" : Faster but fewer species and genes. Also gives consensus scores for each gene mapping based on a several different data sources.

agg_fun

Aggregation function.

agg_method

Aggregation method.

aggregate_orthologs

[Optional] After performing an initial round of many:many aggregation/expansion with many2many_rows, ensure each orthologous gene only appears in one row by using the aggregate_rows function (default: TRUE).

transpose

Transpose gene_df before mapping genes.

mthreshold

maximum number of results per initial alias to show. Shows all by default.

target

target namespace.

numeric_ns

namespace to use for fully numeric IDs (list of available namespaces).

as_integers

Force all values in the matrix to become integers, by applying floor (default: FALSE).

as_sparse

Convert aggregated matrix to sparse matrix.

as_DelayedArray

Convert aggregated matrix to DelayedArray.

dropNA

Drop genes assigned to NA in groupings.

sort_rows

Sort gene_df rows alphanumerically.

verbose

Print messages.

Value

Aggregated matrix

Examples

#### Aggregate within species: gene synonyms ####
data("exp_mouse_enst")                                
X_agg <- aggregate_mapped_genes(gene_df = exp_mouse_enst, 
                                input_species = "mouse")  
                                 
#### Aggregate across species: gene orthologs ####               
data("exp_mouse")
X_agg2 <- aggregate_mapped_genes(gene_df = exp_mouse, 
                                 input_species = "mouse",
                                 output_species = "human",
                                 method="homologene")                                                     

neurogenomics/orthogene documentation built on Jan. 30, 2024, 4:44 a.m.