checkCompleteness: The function to check completeness of a interested genome...

Description Usage Arguments Value Examples

View source: R/checkCompleteness.R

Description

The function to check completeness of a interested genome based on an interested core set.

Usage

1
2
3
4
5
checkCompleteness(
        genome, fasAnno = NULL, coreDir, coreSet, extend = FALSE, 
        scoreMode, refSpecList = NULL, cpu = 4, blastDir = NULL, 
        weightDir = NULL, output = NULL, cleanup = FALSE, redo = FALSE
)

Arguments

genome

The path to the genome fasta file

fasAnno

The path to the fas annotation file. It can equal NULL

coreDir

The path to the core directory, where the core set is stored within weight_dir, blast_dir, etc.

coreSet

The name of the interested core set. The core directory can contains more than one core set and the user must specify the interested core set. The core set will be stored in the folder core_orthologs in subfolder, specify them by the name of the subfolder

extend

The output of the function is a phylogenetic profile of the interested genome. It contains 4 files, .phyloprofile, .extended.fa, _reverse.domains and _forward.domains. If extend = TRUE, the files will be appended into the old files in the folder output of the core directory or in the inputed folder by the user with the argument ppDir. If there is no old files in the folder, the output files of the function will be writen in the new files.

scoreMode

the mode determines the method to scoring the founded ortholog and how to classify them. Choices: 1, 2, 3, "busco"

refSpecList

A list contains one or many genome ID of the genomes, which were used to build the core set. The genome ID of this list will be stored with an priority order, the tool look at into the fasta file of each core group and determine with the priority order to determine the references species for each core group.

cpu

determines the cores that fDOG and fdogFAS will uses to be run parallel

blastDir

The user can replace the blast_dir folder in the core directory by specifying it in this argument. By default is NULL

weightDir

The user can replace the weight_dir folder in the core directory by specifying it in this argument. By default is NULL

cleanup

The fDOG's output is a set of phylogenetic profile of each core group to the interested genome. The phylogenetic profile will be stored into a folder in the core set. The function will merge all the small phylogenetic profile, calculate the FAS score or length to have the whole phylogenetic profile of the interested genome to the core set. This fDOG's output can be reused for all score modes. When cleanup is set to TRUE, the fDOG's output will not be stored to be reused but to be removed

redo

when redo is set to TRUE, all old data of the interested genome include fdogOutput, phyloprofileOutput, completenessOutput and the extended phyloprofile will be removed and fCAT will recheck for this interested genome

outDir

The user can specify the directory to save the output report file of the completeness of the interested genome by specifying the path to the folder in this argument. By default is NULL

Value

A list which contains 2 data.frame. The first table is the completeness report of the interested genome with details information about the classification of the founded ortholog and which gene is missing. The second table is the frequency table of the interested genome within the other genomes, which are present in the old phylogenetic profile. The frequency table give an general sight about how many "dissimilar", "similar", "duplicated" and "missing" genes founded in the interested genome.

Examples

1
2
3
4
5
6
7
8
coreFolder <- system.file("extdata", "sample", package = "fCAT")
genome <- system.file("extdata", "HUMAN@9606@3.fa", package = "fCAT")
fasAnno <- system.file("extdata", "HUMAN@9606@3.json", package = "fCAT")
checkCompleteness(
    genome[1], fasAnno[1], coreFolder[1], "test",
    scoreMode = 2, priorityList = c("HUMAN@9606@1"), extend = TRUE,

)

giangnguyen0709/fCAT documentation built on Feb. 10, 2021, 4:31 a.m.