run_genespace: The GENESPACE pipeline

View source: R/run_genespace.R

run_genespaceR Documentation

The GENESPACE pipeline

Description

run_genespace Run the entire GENESPACE pipeline from beginning to end with one function call.

Usage

run_genespace(
  gsParam,
  overwrite = FALSE,
  overwriteBed = overwrite,
  overwriteSynHits = overwrite,
  overwriteInBlkOF = TRUE,
  makePairwiseFiles = FALSE
)

Arguments

gsParam

A list of genespace parameters created by init_genespace.

overwrite

logical, should all raw files be overwritten except orthofinder results

overwriteBed

logical, should the bed file be re-created and overwritten?

overwriteSynHits

logial, should the annotated blast files be overwritten?

overwriteInBlkOF

logical, should in-block orthogroups be overwritten?

makePairwiseFiles

logical, should pairwise hits in blocks files be generated?

Details

The function calls required to run the full genespace pipeline are printed below. See each function for detailed descriptions. Also, see 'init_genespace'for details on parameter specifications.

  1. 'run_orthofinder' runs orthofinder or finds and copies over data from a previous run.

  2. 'set_syntenyParams' converts parameters in the gsParam list into a matrix of file paths and parameters for each pairwise combination of query and target genomes

  3. 'annotate_bed' reads in all of the bed files, concatenates them and adds some important additional information, including gene rank order, orthofinder IDs, orthogroup information, tandem array identity etc.

  4. 'annotate_blast' reads in all the blast files and adds information from the annotated/combined bed file

  5. 'synteny' is the main engine for genespace. this flags syntenic blocks and make dotplots

  6. 'build_synOGs' integrates syntenic orthogroups across all blast files

  7. 'run_orthofinderInBlk' optionally re-runs orthofinder within each syntenic block, returning phylogenetically hierarchical orthogroups (HOGs)

  8. 'integrate_synteny' interpolates syntenic position of all genes across all genomes

  9. 'pangenes' combines positional and orthogroup information into a single matrix anchored to the gene order coordinates of a single reference

  10. 'plot_riparian' is the primary genespace plotting routine, which stacks the genomes and connects syntenic regions to color-coded reference chromosomes

Value

a gsParam list.

Examples

## Not run: 
###############################################
# -- change paths to those valid on your system
genomeRepo <- "~/path/to/store/rawGenomes"
wd <- "~/path/to/genespace/workingDirectory"
path2mcscanx <- "~/path/to/MCScanX/"
###############################################

dir.create(genomeRepo)
dir.create(wd)
rawFiles <- download_exampleData(filepath = genomeRepo)

parsedPaths <- parse_annotations(
  rawGenomeRepo = genomeRepo,
  genomeDirs = c("human", "chicken"),
  genomeIDs = c("human", "chicken"),
  presets = "ncbi",
  genespaceWd = wd)

gpar <- init_genespace(
  wd = wd, nCores = 4,
  path2mcscanx = path2mcscanx)

out <- run_genespace(gpar)

## End(Not run)


jtlovell/GENESPACE documentation built on Jan. 25, 2025, 6:39 a.m.