Description Usage Arguments Value Author(s) Examples
It develops an ensemble operon pair classifier that combines both genomic and transcriptomic features. The ensemble classifier consists of three machine-learning models that are trained on a small set of confirmed operon pairs (OPs) and non-operon pairs (NOPs). The set of OPs and NOPs is identified by crosschecking the DOOR annotation with consecutive, active coding-sequence and intergenic regions, indicated with CDSs and IGR respectively. The trained ensemble classifier is used to predict the operon status of all the gene-pairs, including DOOR-based operon pairs, namely DOPs, and putative operon pairs (POPs). Finally, a linkage process is exploited to combine consecutive operon-pairs classified as OP, and to build the map of condition-dependent operons.
1 2 3 4 5 |
data.in |
The output of the |
bkgExprCDS |
A threshold to be used for finding active coding-sequence regions. Default values is 0.1. |
bkgExprIGR |
A threshold to be used for finding the active/transcribed intergenic regions. Default values is 0.25. |
maxLenIGR |
Maximum length for the intergenic regions. Default values is 150. |
win.start.trp |
Specify the maximum and the minimum distance from the beginning of a coding region. It is important to validate transcription start points. Defauls values are 100 (max) and 10 (min). |
win.end.trp |
Specify the minimum and maximum distance from the end of a coding region. It is important to validate transcription end points. Defauls values are 10 (min) and 100 (max). |
norm.type |
Character vector indicating the method to use for the normalization step. Default value is "n1". n0 - without normalization; n1 - standardization ((x-mean)/sd); n2 - positional standardization ((x-median)/mad); n3 - unitization ((x-mean)/range); n4 - unitization with zero minimum ((x-min)/range); n5 - normalization in range <-1,1> ((x-mean)/max(abs(x-mean))). |
cl.run |
Number of runs of training/validation. Default values is 30. |
nfolds |
Indicate the number of folds to be used for the cross-validation step. Default values is 5. |
cons |
Indicate the minimum number of positive votes necessary to classify a gene pair as operon pair. Default values is 2. |
find.ext |
To add putative operon pairs classified as OP to the condition-dependent operon map. Defaults to FALSE. |
save.TAB.file |
Character string naming a file. The final condition operon map is saved in a tab-delimeted text file. Default values is NULL - the cond. operon map is not saved. |
save.BED.file |
Character string naming a file. The final condition operon map is saved in a BED-like file. Default values is NULL - the cond. operon map is not saved. |
return.all |
Logical value indicating if extra data must be provided in output. |
verbose |
Indicate whether information about the process should be reported. Defaults to TRUE. |
List of data structures built by CONDOP.
If return.all
is FALSE:
ndata |
A list of dataframes containing OPs and NOPs used for the traing/validation process. One for each count table. |
cls |
A list of OP classifiers for each count table. |
ev.cls |
A data.frame containing the evalaution result for the trained classifiers. One for each count table. |
pred.TS |
A list of dataframes containing the classification results on the training set. One for each count table. |
pred.POPs |
A list of dataframes containing the prediction results on the POPs. One for each count table. |
pred.DOPs |
A list of dataframes containg the prediction results on the DOPs. One for each count table. |
comap |
A list of condition-dependent operon maps (comaps). One for each count table. |
info |
A list of generic information on the confirmed DOOR based operons. One for each count table. |
If return.all
is TRUE the run.CONDOP()
function also provides..
osp |
A list of dataframes containing confirmed operon start points. One for each count table. |
oep |
A list of dataframes containing confirmed operon end points. One for each count table. |
cops |
A list of dataframes containing confirmed operons. One for each count table. |
OPs |
A list of dataframes containing OPs. One for each count table. |
NOPs |
A list of dataframes containing NOPs. One for each count table. |
POPs |
A list of dataframes containing POPs. One for each count table. |
DOPs |
A list of dataframes containing DOPs. One for each count table. |
Vittorio Fortino
1 2 3 4 5 6 7 8 9 10 | ## Not run:
file_operon_annot <- system.file("extdata", "1944.opr", package="CONDOP")
file_genome_seq <- system.file("extdata", "EC-k12-MG1655.fasta", package="CONDOP")
data(ct1)
data.in <- pre.proc(file_genome_annot, file_operon_annot, "NC_000913",
list.cov.dat = list(ct1 = ct1))
res.comap <- run.CONDOP(data.in = data.in, bkgExprCDS = 0.2, bkgExprIGR = 0.2,
maxLenIGR = 150, find.ext = TRUE)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.