View source: R/init_genespace.R
init_genespace | R Documentation |
init_genespace
Searches for desired genome files in the
raw genome repo director.
init_genespace(
wd,
genomeIDs = NULL,
ploidy = 1,
ignoreTheseGenomes = NULL,
path2orthofinder = "orthofinder",
path2diamond = "diamond",
path2mcscanx = "MCScanX",
onewayBlast = FALSE,
orthofinderInBlk = any(ploidy > 1),
useHOGs = TRUE,
rawOrthofinderDir = NA,
diamondUltraSens = FALSE,
nCores = min(c(detectCores()/2, 16)),
maxOgPlaces = 8,
blkSize = 5,
nGaps = 5,
blkRadius = blkSize * 5,
synBuff = 100,
arrayJump = ceiling(synBuff/2),
onlyOgAnchors = TRUE,
nSecondaryHits = 0,
nGapsSecond = nGaps * 2,
blkSizeSecond = blkSize,
blkRadiusSecond = blkRadius,
onlyOgAnchorsSelf = TRUE,
onlyOgAnchorsSecond = FALSE,
maskBuffer = 500,
onlySameChrs = FALSE,
dotplots = "check",
outgroup = ignoreTheseGenomes,
nSecondHits = nSecondaryHits,
synBuffSecond = NULL,
orthofinderMethod = NULL,
speciesIDs = NULL,
minPepLen = NULL,
versionIDs = NULL,
rawGenomeDir = NULL,
diamondMode = NULL,
overwrite = NULL,
gffString = NULL,
pepString = NULL,
verbose = NULL
)
wd |
file.path where the analysis will be run |
genomeIDs |
character vector of length > 1, matching length of speciesIDs, versions and ploidy. Specifies the name to assign to each genome. This vector must be unique and can be any string that begins with a letter (a-z, A-Z) and is alphanumeric. '.' and '_' are allowed as long as they are not the first character. |
ploidy |
integer string specifying ploidy of genome assemblies. This is usually half of the actual ploidy, that is an inbred diploid usually is represented by a haploid genome assembly. |
ignoreTheseGenomes |
character string matching one of the genomeIDs that will be used in the orthofinder -og run but not in the synteny search. Suggested to ensure that there is an outgroup that predates any WGD that the user would like to study. |
path2orthofinder |
character string coercible to a file path that points to the orthofinder executable. If orthofinder is in the path, specify with "orthofinder" |
path2diamond |
character string coercible to a file path that points to the diamond executable. If diamond is in the path, specify with "diamond" |
path2mcscanx |
see path2orthofinder, except to the mcscanx directory. This must contain the MCScanX_h folder. |
onewayBlast |
logical of length 1, specifying whether one-way blasts should be run via 'orthofinder -1 ...'. This replaces orthofinderMethod = "fast", but uses 'diamond2 –more-sensitive' whereas the previous method used –fast specification. Substantial speed improvements in large runs with little loss of fidelity. |
orthofinderInBlk |
logical, should orthofinder be re-run within syntenic regions? Highly recommended for polyploids. When called, HOGs within blocks replace global HOGs or OGs. See useHOGs for more information. |
useHOGs |
logical of length 1 or NA, specifying whether to use phylogenetically hierarchical orthogroups (HOGs) or raw orthogroups. By default (NA), this is decided internally by 'annotate_bed', where the orthogroup type with members that best match the genome ploidy is used. In general, HOGs should be used for any run where all genomes are haploid, since they have been shown to have ~20 However, in cases where we want both homeologs, HOGs may be problematic and probably should not be used for syntenic region calculations. That said, HOGs are always used for within-block orthofinder, which is also the default when any genomes have ploidy > 1. So, the only way to use the deprecated orthogroups.tsv for pan-genome calculation is to set useHOGs = FALSE AND orthofinderInBlk = FALSE. |
rawOrthofinderDir |
file.path of length 1, specifying the location of an existing raw orthofinder run. Defaults to the $wd/orthofinder, but can be any path point to a valid orthofinder run. If not a valid path, this is ignored. |
diamondUltraSens |
logical of length 1, specifying whether the diamond mode run within orthofinder should be –more-sensitive (default, FALSE) or –ultra-sensitive. |
nCores |
integer of length 1 specifying the number of parallel processes to run |
maxOgPlaces |
integer of length 1, specifying the max number of unique placements that an orthogroup can have before being excluded from synteny |
blkSize |
integer of length 1, specifying the -s param to mcscanx |
nGaps |
integer of length 1, specifying the -m param to mcscanx for the primary MCScanX run. This acts on the results from the initial MCScanX run. |
blkRadius |
integer of length 1, specifying the search radius in 2d clustering to assign hits to the same block. This is a sensitive parameter as smaller values will result in more blocks, gaps and SV. Typically using 2x or greater blkSize is fine. |
synBuff |
Numeric > 0, specifying the distance from an anchor to consider a hit syntenic. This parameter is also used to limit the search radius in dbscan-based blk calculation. Larger values will return larger tandem arrays but also may permit inclusion of spurious non-syntenic networks |
arrayJump |
integer of length 1, specifying the maximum distance in gene rank order between two genes in the same tandem array |
onlyOgAnchors |
logical, should only hits in orthogroups be considered for anchors? |
nSecondaryHits |
integer of length 1, specifying the number of secondary hits to look for after masking the primary syntenic regions |
nGapsSecond |
see nGaps, but passed to secondary hits after masking primary hits. |
blkSizeSecond |
see blkSize, but passed to the secondary scan if nSecondaryHits > 0. |
blkRadiusSecond |
see blkRadius, but passed to the secondary scan if nSecondaryHits > 0. |
onlyOgAnchorsSelf |
logical, should only hits in orthogroups be considered for anchors in self-hits (particularly polyploids) |
onlyOgAnchorsSecond |
logical should only hits in orthogroups be considered for anchors in secondary blocks? |
maskBuffer |
numeric (default = 500), the minimum distance that a secondary (or homeolog w/in polyploid genome) block can be created relative to an existing block. |
onlySameChrs |
logical - should synteny be only considered between chromosomes with the same name? |
dotplots |
character string either "always", "never", or "check". Default (check) only writes a dotplot if there are < 10k unique chromosome combinations (facets). "always" means that dotplots are made regardless of facet numbers, which can be very slow in some instances. "never" is by far the fastest method, but also never produces dotplots. |
outgroup |
deprecated in V1. See ignoreTheseGenomes. |
nSecondHits |
integer of length 1, specifying the number of blast hits to include after masking. |
synBuffSecond |
see syntenyBuffer. Applied only to synteny construction of secondary hits. |
orthofinderMethod |
deprecated in V1. See onewayBlast. |
speciesIDs |
deprecated in V1. See 'parse_annotations'. |
minPepLen |
deprecated in V1. All genes in the peptide fasta are used. |
versionIDs |
deprecated in V1. See 'parse_annotations'. |
rawGenomeDir |
deprecated in V1. See 'parse_annotations'. |
diamondMode |
deprecated in V1. 'fast' mode is no longer available. –ultra-sensitive is available via diamondUltraSens. |
overwrite |
deprecated in V1. Results are never over-written. |
gffString |
deprecated in V1. See 'parse_annotations'. |
pepString |
deprecated in V1. See 'parse_annotations'. |
verbose |
deprecated in V1. All updates are printed to the console |
Simple directory parser to find and check the paths to all annotation and assembly files.
A list containing paths to the raw files. If a file is not found, path is returned as null and a warning is printed.
## Not run:
# coming soon
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.