View source: R/run_genespace.R
run_genespace | R Documentation |
run_genespace
Run the entire GENESPACE pipeline from beginning to end
with one function call.
run_genespace(
gsParam,
overwrite = FALSE,
overwriteBed = overwrite,
overwriteSynHits = overwrite,
overwriteInBlkOF = TRUE,
makePairwiseFiles = FALSE
)
gsParam |
A list of genespace parameters created by init_genespace. |
overwrite |
logical, should all raw files be overwritten except orthofinder results |
overwriteBed |
logical, should the bed file be re-created and overwritten? |
overwriteSynHits |
logial, should the annotated blast files be overwritten? |
overwriteInBlkOF |
logical, should in-block orthogroups be overwritten? |
makePairwiseFiles |
logical, should pairwise hits in blocks files be generated? |
The function calls required to run the full genespace pipeline are printed below. See each function for detailed descriptions. Also, see 'init_genespace'for details on parameter specifications.
'run_orthofinder' runs orthofinder or finds and copies over data from a previous run.
'set_syntenyParams' converts parameters in the gsParam list into a matrix of file paths and parameters for each pairwise combination of query and target genomes
'annotate_bed' reads in all of the bed files, concatenates them and adds some important additional information, including gene rank order, orthofinder IDs, orthogroup information, tandem array identity etc.
'annotate_blast' reads in all the blast files and adds information from the annotated/combined bed file
'synteny' is the main engine for genespace. this flags syntenic blocks and make dotplots
'build_synOGs' integrates syntenic orthogroups across all blast files
'run_orthofinderInBlk' optionally re-runs orthofinder within each syntenic block, returning phylogenetically hierarchical orthogroups (HOGs)
'integrate_synteny' interpolates syntenic position of all genes across all genomes
'pangenes' combines positional and orthogroup information into a single matrix anchored to the gene order coordinates of a single reference
'plot_riparian' is the primary genespace plotting routine, which stacks the genomes and connects syntenic regions to color-coded reference chromosomes
a gsParam list.
## Not run:
###############################################
# -- change paths to those valid on your system
genomeRepo <- "~/path/to/store/rawGenomes"
wd <- "~/path/to/genespace/workingDirectory"
path2mcscanx <- "~/path/to/MCScanX/"
###############################################
dir.create(genomeRepo)
dir.create(wd)
rawFiles <- download_exampleData(filepath = genomeRepo)
parsedPaths <- parse_annotations(
rawGenomeRepo = genomeRepo,
genomeDirs = c("human", "chicken"),
genomeIDs = c("human", "chicken"),
presets = "ncbi",
genespaceWd = wd)
gpar <- init_genespace(
wd = wd, nCores = 4,
path2mcscanx = path2mcscanx)
out <- run_genespace(gpar)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.