infer_species: Infer species from gene names

View source: R/infer_species.R

infer_speciesR Documentation

Infer species from gene names

Description

Infers which species the genes within gene_df is from. Iteratively test the percentage of gene_df genes that match with the genes from each test_species.

Usage

infer_species(
  gene_df,
  gene_input = "rownames",
  test_species = c("human", "monkey", "rat", "mouse", "zebrafish", "fly"),
  method = c("homologene", "gprofiler", "babelgene"),
  make_plot = TRUE,
  show_plot = TRUE,
  verbose = TRUE
)

Arguments

gene_df

Data object containing the genes (see gene_input for options on how the genes can be stored within the object).
Can be one of the following formats:

  • matrix :
    A sparse or dense matrix.

  • data.frame :
    A data.frame, data.table. or tibble.

  • codelist :
    A list or character vector.

Genes, transcripts, proteins, SNPs, or genomic ranges can be provided in any format (HGNC, Ensembl, RefSeq, UniProt, etc.) and will be automatically converted to gene symbols unless specified otherwise with the ... arguments.
Note: If you set method="homologene", you must either supply genes in gene symbol format (e.g. "Sox2") OR set standardise_genes=TRUE.

gene_input

Which aspect of gene_df to get gene names from:

  • "rownames" :
    From row names of data.frame/matrix.

  • "colnames" :
    From column names of data.frame/matrix.

  • <column name> :
    From a column in gene_df, e.g. "gene_names".

test_species

Which species to test for matches with. If set to NULL, will default to a list of humans and 5 common model organisms. If test_species is set to one of the following options, it will automatically pull all species from that respective package and test against each of them:

  • "homologene" : 20+ species (default)

  • "gprofiler" : 700+ species

  • "babelgene" : 19 species

method

R package to use for gene mapping:

  • "gprofiler" : Slower but more species and genes.

  • "homologene" : Faster but fewer species and genes.

  • "babelgene" : Faster but fewer species and genes. Also gives consensus scores for each gene mapping based on a several different data sources.

make_plot

Make a plot of the results.

show_plot

Print the plot of the results.

verbose

Print messages.

Value

An ordered dataframe of test_species from best to worst matches.

Examples

 
data("exp_mouse") 
matches <- orthogene::infer_species(gene_df = exp_mouse[1:200,])

neurogenomics/orthogene documentation built on Jan. 30, 2024, 4:44 a.m.