run.CONDOP: Build condition-dependent operon maps.
In CONDOP: Condition-Dependent Operon Predictions

Description Usage Arguments Value Author(s) Examples

It develops an ensemble operon pair classifier that combines both genomic and transcriptomic features. The ensemble classifier consists of three machine-learning models that are trained on a small set of confirmed operon pairs (OPs) and non-operon pairs (NOPs). The set of OPs and NOPs is identified by crosschecking the DOOR annotation with consecutive, active coding-sequence and intergenic regions, indicated with CDSs and IGR respectively. The trained ensemble classifier is used to predict the operon status of all the gene-pairs, including DOOR-based operon pairs, namely DOPs, and putative operon pairs (POPs). Finally, a linkage process is exploited to combine consecutive operon-pairs classified as OP, and to build the map of condition-dependent operons.

run.CONDOP(data.in, bkgExprCDS = 0.1, bkgExprIGR = 0.25, maxLenIGR = 150,
  win.start.trp = c(100, 10), win.end.trp = c(10, 100), norm.type = "n1",
  cl.run = 30, nfolds = 5, cons = 2, find.ext = FALSE,
  save.TAB.file = NULL, save.BED.file = NULL, return.all = FALSE,
  verbose = TRUE)

`data.in`	The output of the `pre.proc` function.
`bkgExprCDS`	A threshold to be used for finding active coding-sequence regions. Default values is 0.1.
`bkgExprIGR`	A threshold to be used for finding the active/transcribed intergenic regions. Default values is 0.25.
`maxLenIGR`	Maximum length for the intergenic regions. Default values is 150.
`win.start.trp`	Specify the maximum and the minimum distance from the beginning of a coding region. It is important to validate transcription start points. Defauls values are 100 (max) and 10 (min).
`win.end.trp`	Specify the minimum and maximum distance from the end of a coding region. It is important to validate transcription end points. Defauls values are 10 (min) and 100 (max).
`norm.type`	Character vector indicating the method to use for the normalization step. Default value is "n1". n0 - without normalization; n1 - standardization ((x-mean)/sd); n2 - positional standardization ((x-median)/mad); n3 - unitization ((x-mean)/range); n4 - unitization with zero minimum ((x-min)/range); n5 - normalization in range <-1,1> ((x-mean)/max(abs(x-mean))).
`cl.run`	Number of runs of training/validation. Default values is 30.
`nfolds`	Indicate the number of folds to be used for the cross-validation step. Default values is 5.
`cons`	Indicate the minimum number of positive votes necessary to classify a gene pair as operon pair. Default values is 2.
`find.ext`	To add putative operon pairs classified as OP to the condition-dependent operon map. Defaults to FALSE.
`save.TAB.file`	Character string naming a file. The final condition operon map is saved in a tab-delimeted text file. Default values is NULL - the cond. operon map is not saved.
`save.BED.file`	Character string naming a file. The final condition operon map is saved in a BED-like file. Default values is NULL - the cond. operon map is not saved.
`return.all`	Logical value indicating if extra data must be provided in output.
`verbose`	Indicate whether information about the process should be reported. Defaults to TRUE.

List of data structures built by CONDOP. If return.all is FALSE:

`ndata`	A list of dataframes containing OPs and NOPs used for the traing/validation process. One for each count table.
`cls`	A list of OP classifiers for each count table.
`ev.cls`	A data.frame containing the evalaution result for the trained classifiers. One for each count table.
`pred.TS`	A list of dataframes containing the classification results on the training set. One for each count table.
`pred.POPs`	A list of dataframes containing the prediction results on the POPs. One for each count table.
`pred.DOPs`	A list of dataframes containg the prediction results on the DOPs. One for each count table.
`comap`	A list of condition-dependent operon maps (comaps). One for each count table.
`info`	A list of generic information on the confirmed DOOR based operons. One for each count table.

If return.all is TRUE the run.CONDOP() function also provides..

`osp`	A list of dataframes containing confirmed operon start points. One for each count table.
`oep`	A list of dataframes containing confirmed operon end points. One for each count table.
`cops`	A list of dataframes containing confirmed operons. One for each count table.
`OPs`	A list of dataframes containing OPs. One for each count table.
`NOPs`	A list of dataframes containing NOPs. One for each count table.
`POPs`	A list of dataframes containing POPs. One for each count table.
`DOPs`	A list of dataframes containing DOPs. One for each count table.

Vittorio Fortino

## Not run: 
    file_operon_annot <- system.file("extdata", "1944.opr", package="CONDOP")
    file_genome_seq   <- system.file("extdata", "EC-k12-MG1655.fasta", package="CONDOP")
    data(ct1)
    data.in   <- pre.proc(file_genome_annot, file_operon_annot, "NC_000913",
                          list.cov.dat = list(ct1 = ct1)) 
    res.comap <- run.CONDOP(data.in = data.in, bkgExprCDS = 0.2, bkgExprIGR = 0.2, 
                            maxLenIGR = 150, find.ext = TRUE)                      

## End(Not run)