ewce_para: EWCE parallel

View source: R/ewce_para.R

ewce_paraR Documentation

EWCE parallel

Description

Runs EWCE in parallel on multiple gene lists.

Usage

ewce_para(
  ctd,
  gene_data,
  list_name_column = "hpo_id",
  gene_column = "gene_symbol",
  list_names = unique(gene_data[[list_name_column]]),
  reps = 100,
  annotLevel = 1,
  force_new = FALSE,
  genelistSpecies = "human",
  sctSpecies = "human",
  bg = get_bg(species1 = genelistSpecies, species2 = sctSpecies, overwrite = force_new),
  min_genes = 4,
  save_dir_tmp = tempdir(),
  parallel_boot = FALSE,
  cores = 1,
  verbose = FALSE,
  ...
)

Arguments

ctd

CellTypeDataset generated using generate_celltype_data.

gene_data

data frame of gene list names and genes (see get_gene_lists).

list_name_column

The name of the gene_data column that has the gene list names.

gene_column

The name of the gene_data column that contains the genes.

list_names

character vector of gene list names.

reps

Number of random gene lists to generate (Default: 100, but should be >=10,000 for publication-quality results).

annotLevel

An integer indicating which level of sct_data to analyse (Default: 1).

force_new

Overwrite previous results in the save_dir_tmp.

genelistSpecies

Species that hits genes came from (no longer limited to just "mouse" and "human"). See list_species for all available species.

sctSpecies

Species that sct_data is currently formatted as (no longer limited to just "mouse" and "human"). See list_species for all available species.

bg

List of gene symbols containing the background gene list (including hit genes). If bg=NULL, an appropriate gene background will be created automatically.

min_genes

Minimum number of genes per list (default: 4)

save_dir_tmp

Folder to save intermediate results files to (one file per gene list). Set to NULL to skip saving temporary files.

parallel_boot

Parallelise at the level of bootstrap iterations, rather than across gene lists.

cores

The number of cores to run in parallel (e.g. 8) int.

verbose

Print messages.

...

Arguments passed on to EWCE::bootstrap_enrichment_test

sct_data

List generated using generate_celltype_data.

hits

List of gene symbols containing the target gene list. Will automatically be converted to human gene symbols if geneSizeControl=TRUE.

sctSpecies_origin

Species that the sct_data originally came from, regardless of its current gene format (e.g. it was previously converted from mouse to human gene orthologs). This is used for computing an appropriate backgrund.

output_species

Species to convert sct_data and hits to (Default: "human"). See list_species for all available species.

method

R package to use for gene mapping:

  • "gprofiler" : Slower but more species and genes.

  • "homologene" : Faster but fewer species and genes.

  • "babelgene" : Faster but fewer species and genes. Also gives consensus scores for each gene mapping based on a several different data sources.

no_cores

Number of cores to parallelise bootstrapping reps over.

geneSizeControl

Whether you want to control for GC content and transcript length. Recommended if the gene list originates from genetic studies (Default: FALSE). If set to TRUE, then hits must be from humans.

controlledCT

[Optional] If not NULL, and instead is the name of a cell type, then the bootstrapping controls for expression within that cell type.

mtc_method

Multiple-testing correction method (passed to p.adjust).

sort_results

Sort enrichment results from smallest to largest p-values.

standardise_sct_data

Should sct_data be standardised? if TRUE:

  • When sctSpecies!=output_species the sct_data will be checked for object formatting and the genes will be converted to the orthologs of the output_species with standardise_ctd (which calls map_genes internally).

  • When sctSpecies==output_species, the sct_data will be checked for object formatting with standardise_ctd, but the gene names will remain untouched.

standardise_hits

Should hits be standardised? If TRUE:

  • When genelistSpecies!=output_species, the genes will be converted to the orthologs of the output_species with convert_orthologs.

  • When genelistSpecies==output_species, the genes will be standardised with map_genes.

If FALSE, hits will be passed on to subsequent steps as-is.

localHub

If working offline, add argument localHub=TRUE to work with a local, non-updated hub; It will only have resources available that have previously been downloaded. If offline, Please also see BiocManager vignette section on offline use to ensure proper functionality.

store_gene_data

Store sampled gene data for every bootstrap iteration. When the number of bootstrap reps is very high (>=100k) and/or the number of genes in hits is very high, you may want to set store_gene_data=FALSE to avoid using excessive amounts of CPU memory.

Value

Paths to saved results at "(save_dir)/(list_name).rds" (when save_dir!=NULL), or a nested list of results (when save_dir==NULL).

Examples

gene_data <- HPOExplorer::load_phenotype_to_genes()
ctd <- MSTExplorer::load_example_ctd()
list_names <- unique(gene_data$hpo_id)[seq(3)]
res_files <- ewce_para(ctd = ctd,
                       gene_data = gene_data,
                       list_names = list_names,
                       reps = 10)

neurogenomics/MultiEWCE documentation built on Jan. 16, 2025, 12:54 a.m.