runFdogBusco: This function is used as a modul in the function...

Description Usage Arguments Value Examples

View source: R/busco.R

Description

This function is used as a modul in the function computeReport() when using socre mode "busco". The function computeReport() take the path to the genome fasta file and the annotation file (if the user did not input it, it will be computed) of the interested genome as one of its inputs and arrange the files into the equivalent folder of the core directory. The genome fasta file will be copied and saved into the query_taxon folder. A symbolic link of the annotation file will be created and saved in the folder weight_dir. After all the files of the interested genome were arranged into the folder, the function runFdogBusco or the function runFdog will be called, depent on the using score mode, the function will take the path to the core directory and specify the path of the folder core_orthologs, weight_dir, query_taxon, blast_dir as the input hmmpath, weightpath, searchpath, blastpath for fDOG and run fDOG on this inputs to search ortholog on the interested genome for the equivalent inputed core set. When the search with fDOG is finished, the function will calculate the FAS score with fdogFAS or the length of each ortholog sequence based on the using score mode. The function with return the phylogenetic profile of the interested genome to the core set within the FAS scores or the lengths

Usage

1
2
3
4
runFdogBusco(
    root, coreSet, extend = FALSE, scoreMode, priorityList = NULL, cpu,
    blastDir = NULL, weightDir = NULL, cleanup = FALSE,
    reFdog = FALSE, fdogDir = NULL, ppDir = NULL)

Arguments

root

The path to the core directory, where the core set is stored within weight_dir, blast_dir, etc.

coreSet

The name of the interested core set. The core directory can contains more than one core set and the user must specify the interested core set. The core set will be stored in the folder core_orthologs in subfolder, specify them by the name of the subfolder

extend

The output of the function is a phylogenetic profile of the interested genome. It contains 4 files, .phyloprofile, .extended.fa, _reverse.domains and _forward.domains. If extend = TRUE, the files will be appended into the old files in the folder output of the core directory or in the inputed folder by the user with the argument ppDir. If there is no old files in the folder, the output files of the function will be writen in the new files..

scoreMode

the mode determines the method to scoring the founded ortholog and how to classify them. Choices: 1, 2, 3, "busco"

priorityList

A list contains one or many genome ID of the genomes, which were used to build the core set. The genome ID of this list will be stored with an priority order, the tool look at into the fasta file of each core group and determine with the priority order to determine the references species for each core group.

cpu

determines the cores that fDOG and fdogFAS will uses to be run parallel

blastDir

The user can replace the blast_dir folder in the core directory by specifying it in this argument

weightDir

The user can replace the weight_dir folder in the core directory by specifying it in this argument

output

The directory which contains the output directory

cleanup

The fDOG's output is a set of phylogenetic profile of each core group to the interested genome. The phylogenetic profile will be stored into a folder in the core set. The function will merge all the small phylogenetic profile, calculate the FAS score or length to have the whole phylogenetic profile of the interested genome to the core set. This fDOG's output can be reused for all score modes. When cleanup is set to TRUE, the fDOG's output will not be stored to be reused but to be removed

reFdog

If it already exist a fDOG's output for a specific core group the tool will skip this core group and go to the next core group. If reFdog is set to TRUE, the tool will remove all the existed fDOG's output and rerun fDOG for all core groups of the set

fdogDir

Normally the fDOG's output will be stored in the folder fdogout in the core directory, but the user can specify the folder for fDOG's output by specify the path to it in this argument. Notice here, is the fDOG's output folder will contains the subfolder, equivalent to the name of the interested genome, for example, the folder can contain "HUMAN@9606@3" and "AMPQU@400682@2", for a completeness checking on an interested genome, which has a subfolder in the fDOG's output folder with the same name, the function will look into the subfolder to find the existed fDOG's output

ppDir

The user can replace the default folder output in the core directory, where the phylogenetic profiles are stored by his folder. The user can specify the path to his folder in this argument

Value

phylogenetic profile of the genome in data.frame

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
## Take the demo data
coreFolder <- system.file("extdata", "sample", package = "fCAT")
genome <- system.file("extdata", "HUMAN@9606@3.fa", package = "fCAT")
fasAnno <- system.file("extdata", "HUMAN@9606@3.json", package = "fCAT")

## Place seed
placeSeed(genome, fasAnno, coreFolder)

## Test runFdogBusco
pp <- runFdogBusco(coreFolder, "test",
    extend = FALSE, scoreMode = "busco",
    priorityList = c("HUMAN@9606@1"), cpu = 4, output = getwd()
)

print.data.frame(pp)

## Delete seed
fastaFolder <- paste(coreFolder, "/query_taxon/HUMAN@9606@3", sep = "")
annoPath <- paste(coreFolder, "/weight_dir/HUMAN@9606@3.json", sep = "")
unlink(fastaFolder, recursive = TRUE)
file.remove(annoPath)

giangnguyen0709/fCAT documentation built on Feb. 10, 2021, 4:31 a.m.