check_ewce_genelist_inputs: check_ewce_genelist_inputs

View source: R/check_ewce_genelist_input.r

check_ewce_genelist_inputsR Documentation

check_ewce_genelist_inputs

Description

check_ewce_genelist_inputs Is used to check that hits and bg gene lists passed to EWCE are setup correctly. Checks they are the appropriate length. Checks all hits are in bg. Checks the species match and if not reduces to 1:1 orthologs.

Usage

check_ewce_genelist_inputs(
  sct_data,
  hits,
  bg = NULL,
  genelistSpecies = NULL,
  sctSpecies = NULL,
  sctSpecies_origin = sctSpecies,
  output_species = "human",
  method = "homologene",
  geneSizeControl = FALSE,
  standardise_sct_data = TRUE,
  standardise_hits = FALSE,
  min_genes = 4,
  verbose = TRUE
)

Arguments

sct_data

List generated using generate_celltype_data.

hits

List of gene symbols containing the target gene list. Will automatically be converted to human gene symbols if geneSizeControl=TRUE.

bg

List of gene symbols containing the background gene list (including hit genes). If bg=NULL, an appropriate gene background will be created automatically.

genelistSpecies

Species that hits genes came from (no longer limited to just "mouse" and "human"). See list_species for all available species.

sctSpecies

Species that sct_data is currently formatted as (no longer limited to just "mouse" and "human"). See list_species for all available species.

sctSpecies_origin

Species that the sct_data originally came from, regardless of its current gene format (e.g. it was previously converted from mouse to human gene orthologs). This is used for computing an appropriate backgrund.

output_species

Species to convert sct_data and hits to (Default: "human"). See list_species for all available species.

method

R package to use for gene mapping:

  • "gprofiler" : Slower but more species and genes.

  • "homologene" : Faster but fewer species and genes.

  • "babelgene" : Faster but fewer species and genes. Also gives consensus scores for each gene mapping based on a several different data sources.

geneSizeControl

Whether you want to control for GC content and transcript length. Recommended if the gene list originates from genetic studies (Default: FALSE). If set to TRUE, then hits must be from humans.

standardise_sct_data

Should sct_data be standardised? if TRUE:

  • When sctSpecies!=output_species the sct_data will be checked for object formatting and the genes will be converted to the orthologs of the output_species with standardise_ctd (which calls map_genes internally).

  • When sctSpecies==output_species, the sct_data will be checked for object formatting with standardise_ctd, but the gene names will remain untouched.

standardise_hits

Should hits be standardised? If TRUE:

  • When genelistSpecies!=output_species, the genes will be converted to the orthologs of the output_species with convert_orthologs.

  • When genelistSpecies==output_species, the genes will be standardised with map_genes.

If FALSE, hits will be passed on to subsequent steps as-is.

min_genes

Minimum number of genes in a gene list to test.

verbose

Print messages.

Value

A list containing

  • hits: Array of MGI/HGNC gene symbols containing the target gene list.

  • bg: Array of MGI/HGNC gene symbols containing the background gene list.

Examples

ctd <- ewceData::ctd()
example_genelist <- ewceData::example_genelist()

# Called from "bootstrap_enrichment_test()" and "generate_bootstrap_plots()"
checkedLists <- EWCE::check_ewce_genelist_inputs(
    sct_data = ctd,
    hits = example_genelist,
    sctSpecies = "mouse",
    genelistSpecies = "human"
)

NathanSkene/EWCE documentation built on April 10, 2024, 1:02 a.m.