gene-naming-conversions: Gene Naming Conversion Functions

HumanToMouseGenesymbolR Documentation

Gene Naming Conversion Functions

Description

These functions facilitate the conversion between human and mouse gene symbols and Ensembl IDs, and vice versa. They leverage both local and remote databases to provide fast and reliable gene identifier conversions, supporting a wide range of genetic studies.

Usage

HumanToMouseGenesymbol(
  human_genes,
  mirror = NULL,
  local.mode = T,
  keep.seq = F,
  match = T
)

MouseToHumanGenesymbol(
  mouse_genes,
  mirror = NULL,
  local.mode = T,
  keep.seq = F,
  match = T
)

EnsemblToGenesymbol(
  Ensembl,
  spe = getOption("spe"),
  mirror = NULL,
  local.mode = T,
  keep.seq = F,
  match = T,
  keep.orig.id = F
)

GenesymbolToEnsembl(
  Genesymbol,
  spe = getOption("spe"),
  mirror = NULL,
  local.mode = T,
  keep.seq = F,
  match = T
)

Arguments

human_genes

A vector or matrix of human gene symbols to be converted to mouse gene symbols or Ensembl IDs. If a matrix is provided, the conversion is applied to the row names representing genes.

mirror

Optional parameter to specify an alternative BioMart mirror if the main site is inaccessible, when ‘local.mode' is set to FALSE. Possible mirrors include ’www', 'useast', and 'asia'. Default: NULL.

local.mode

Indicates whether to use a local database for conversions. Using local databases is recommended for faster response times and increased reliability. Default: TRUE.

keep.seq

Boolean flag to determine whether to maintain a one-to-one mapping in the output. This is useful when precise correspondence between input and output identifiers is required. Default: FALSE.

match

Boolean flag to specify whether to return a matching table or a simple vector of results. When TRUE, the function returns a data frame showing how inputs match to outputs; when FALSE, it returns only the output identifiers. Default: TRUE.

mouse_genes

A vector or matrix of mouse gene symbols to be converted to human gene symbols or Ensembl IDs. If a matrix is provided, the conversion is applied to the row names representing genes.

Ensembl

A vector or matrix of Ensembl IDs to be converted to gene symbols. If a matrix is provided, the conversion is applied to the row names representing genes.

spe

Specifies the species for which the conversion is being performed. Possible values are 'human' or 'mouse', depending on the function used.

keep.orig.id

Only applicable for the EnsemblToGenesymbol function. This parameter is useful when converting Ensembl IDs to gene symbols, especially in cases where some Ensembl IDs do not have corresponding gene symbols. Setting this to TRUE allows the function to retain rows with original Ensembl IDs in the gene expression matrix, which is useful for downstream analyses such as PCA or clustering. Default: FALSE.

Genesymbol

A vector or matrix of gene symbols to be converted to Ensembl IDs. If a matrix is provided, the conversion is applied to the row names representing genes.

Details

These functions are essential tools for researchers working across genomic databases or conducting comparative studies between human and mouse. By utilizing both local and remote databases, these functions ensure that gene naming conversions are both accurate and adaptable to a variety of research needs.

Value

Depending on the function used and parameters set, the output can vary:

- A vector of converted gene identifiers, if 'match' is set to FALSE.

- A named vector where names are the input identifiers and values are the converted identifiers, if 'keep.seq' is set to TRUE.

- A data.frame showing detailed matches between input and output identifiers, if 'match' is set to TRUE.

- A matrix with converted gene identifiers as row names, if the input was a matrix and 'keep.seq' is TRUE. This output type is particularly useful for converting entire gene expression matrices for subsequent analyses.

Examples

# Load SeuratExtend and prepare a vector of human gene symbols
library(SeuratExtend)
human_genes <- VariableFeatures(pbmc)[1:6]
print(human_genes)

# Default usage: Convert human gene symbols to mouse gene symbols and return a detailed match table
HumanToMouseGenesymbol(human_genes)

# Simplified output without match details
MouseToHumanGenesymbol(human_genes, match = FALSE)

# Ensure one-to-one correspondence with named vector output
GenesymbolToEnsembl(human_genes, keep.seq = TRUE)

# Convert a gene expression matrix from human to mouse gene symbols
human_matr <- GetAssayData(pbmc)[human_genes, 1:4]
print(human_matr)
mouse_matr <- HumanToMouseGenesymbol(human_matr)
print(mouse_matr)

# Convert mouse gene symbols to human gene symbols
mouse_genes <- c("Cd14", "Cd3d", "Cd79a")
MouseToHumanGenesymbol(mouse_genes, match = FALSE)

# Convert human gene symbols to Ensembl IDs and back, ensuring one-to-one correspondence
ens_ids <- GenesymbolToEnsembl(human_genes, spe = "human", keep.seq = TRUE)
print(ens_ids)
back_to_symbols <- EnsemblToGenesymbol(ens_ids, spe = "human", keep.seq = TRUE)
print(back_to_symbols)

# Fetch Ensembl IDs using an online BioMart database when local conversion is not sufficient
online_ens_ids <- GenesymbolToEnsembl(human_genes, spe = "human", local.mode = FALSE, keep.seq = TRUE)
print(online_ens_ids)

huayc09/SeuratExtend documentation built on July 15, 2024, 6:22 p.m.