orthogene: Interspecies gene mapping

many2many_rows

R Documentation

Expand/aggregate rows of matrix for many:many mappings

Description

Expand/aggregate rows of a matrix with any combination of many:many mappings. This method ensures that total counts per gene remain the same regardless of how many genes it has split/condensed into. This allows for many:many mappings that are otherwise not possible using standard aggregation functions, since they all require many:1 scenarios.
Internally, this is done as follows:

Identify genes that appear more than once in gene_map[[input_col]].
For each gene identified, split its row into multiple rows, where the number of new rows is equal to the number of times that gene appears within gene_map[[input_col]]. In the new expanded matrix, each row will be equal to the column sums divided by the number of new rows. This means that averaged counts will be split equally amongst the new rows, in a column-specific manner.
Thus, the column sums of the output matrix will be equal to the column sums in the input matrix. In the case of gene expression count matrices, this means that the total counts will remain equal between matrices, while avoiding being forced to drop genes with many:many mappings (as is the case with most other aggregation methods).
Map rownames of the expanded matrix onto the orthologous gene names from gene_map$ortholog_gene.
[Optional] : When aggregate_orthologs=TRUE, aggregate rows of the expanded/mapped matrix such that there will only be 1 row per ortholog gene, using aggregate_rows. The arguments FUN, method, as_sparse, as_DelayedArray, and dropNA will all be passed to aggregate_rows if this step is selected.

Usage

many2many_rows(
  X,
  gene_map,
  input_col = "input_gene",
  output_col = "ortholog_gene",
  agg_fun = "sum",
  agg_method = c("monocle3", "stats"),
  as_sparse = TRUE,
  as_DelayedArray = FALSE,
  dropNA = TRUE,
  aggregate_orthologs = TRUE,
  verbose = TRUE
)

Arguments

`X`	Input matrix.
`gene_map`	A data.frame generated by map_orthologs, with columns mapping `input_col` to `output_col`.
`input_col`	Column name within `gene_map` with gene names matching the row names of `X`.
`output_col`	Column name within `gene_map` with gene names that you wish you map the row names of `X` onto.
`agg_fun`	Aggregation function.
`agg_method`	Aggregation method.
`as_sparse`	Convert aggregated matrix to sparse matrix.
`as_DelayedArray`	Convert aggregated matrix to DelayedArray.
`dropNA`	Drop genes assigned to `NA` in `groupings`.
`aggregate_orthologs`	[Optional] After performing an initial round of many:many aggregation/expansion with many2many_rows, ensure each orthologous gene only appears in one row by using the aggregate_rows function (default: `TRUE`).
`verbose`	Print messages.

Value

Expanded/aggregated matrix.

Source

data("exp_mouse") X <- exp_mouse gene_map <- orthogene:::map_orthologs(genes = rownames(exp_mouse), input_species = "mouse", method="homologene") X_agg <- orthogene:::many2many_rows(X = X, gene_map = gene_map) sum(duplicated(rownames(exp_mouse))) # 0 sum(duplicated(gene_map$input_gene)) # 46 sum(duplicated(gene_map$ortholog_gene)) # 56 sum(duplicated(rownames(X_agg))) # 56

neurogenomics/orthogene documentation built on April 17, 2025, 9:30 p.m.