run.CONDOP: Build condition-dependent operon maps.

Description Usage Arguments Value Author(s) Examples

View source: R/CONDOP.R

Description

It develops an ensemble operon pair classifier that combines both genomic and transcriptomic features. The ensemble classifier consists of three machine-learning models that are trained on a small set of confirmed operon pairs (OPs) and non-operon pairs (NOPs). The set of OPs and NOPs is identified by crosschecking the DOOR annotation with consecutive, active coding-sequence and intergenic regions, indicated with CDSs and IGR respectively. The trained ensemble classifier is used to predict the operon status of all the gene-pairs, including DOOR-based operon pairs, namely DOPs, and putative operon pairs (POPs). Finally, a linkage process is exploited to combine consecutive operon-pairs classified as OP, and to build the map of condition-dependent operons.

Usage

1
2
3
4
5
run.CONDOP(data.in, bkgExprCDS = 0.1, bkgExprIGR = 0.25, maxLenIGR = 150,
  win.start.trp = c(100, 10), win.end.trp = c(10, 100), norm.type = "n1",
  cl.run = 30, nfolds = 5, cons = 2, find.ext = FALSE,
  save.TAB.file = NULL, save.BED.file = NULL, return.all = FALSE,
  verbose = TRUE)

Arguments

data.in

The output of the pre.proc function.

bkgExprCDS

A threshold to be used for finding active coding-sequence regions. Default values is 0.1.

bkgExprIGR

A threshold to be used for finding the active/transcribed intergenic regions. Default values is 0.25.

maxLenIGR

Maximum length for the intergenic regions. Default values is 150.

win.start.trp

Specify the maximum and the minimum distance from the beginning of a coding region. It is important to validate transcription start points. Defauls values are 100 (max) and 10 (min).

win.end.trp

Specify the minimum and maximum distance from the end of a coding region. It is important to validate transcription end points. Defauls values are 10 (min) and 100 (max).

norm.type

Character vector indicating the method to use for the normalization step. Default value is "n1". n0 - without normalization; n1 - standardization ((x-mean)/sd); n2 - positional standardization ((x-median)/mad); n3 - unitization ((x-mean)/range); n4 - unitization with zero minimum ((x-min)/range); n5 - normalization in range <-1,1> ((x-mean)/max(abs(x-mean))).

cl.run

Number of runs of training/validation. Default values is 30.

nfolds

Indicate the number of folds to be used for the cross-validation step. Default values is 5.

cons

Indicate the minimum number of positive votes necessary to classify a gene pair as operon pair. Default values is 2.

find.ext

To add putative operon pairs classified as OP to the condition-dependent operon map. Defaults to FALSE.

save.TAB.file

Character string naming a file. The final condition operon map is saved in a tab-delimeted text file. Default values is NULL - the cond. operon map is not saved.

save.BED.file

Character string naming a file. The final condition operon map is saved in a BED-like file. Default values is NULL - the cond. operon map is not saved.

return.all

Logical value indicating if extra data must be provided in output.

verbose

Indicate whether information about the process should be reported. Defaults to TRUE.

Value

List of data structures built by CONDOP. If return.all is FALSE:

ndata

A list of dataframes containing OPs and NOPs used for the traing/validation process. One for each count table.

cls

A list of OP classifiers for each count table.

ev.cls

A data.frame containing the evalaution result for the trained classifiers. One for each count table.

pred.TS

A list of dataframes containing the classification results on the training set. One for each count table.

pred.POPs

A list of dataframes containing the prediction results on the POPs. One for each count table.

pred.DOPs

A list of dataframes containg the prediction results on the DOPs. One for each count table.

comap

A list of condition-dependent operon maps (comaps). One for each count table.

info

A list of generic information on the confirmed DOOR based operons. One for each count table.

If return.all is TRUE the run.CONDOP() function also provides..

osp

A list of dataframes containing confirmed operon start points. One for each count table.

oep

A list of dataframes containing confirmed operon end points. One for each count table.

cops

A list of dataframes containing confirmed operons. One for each count table.

OPs

A list of dataframes containing OPs. One for each count table.

NOPs

A list of dataframes containing NOPs. One for each count table.

POPs

A list of dataframes containing POPs. One for each count table.

DOPs

A list of dataframes containing DOPs. One for each count table.

Author(s)

Vittorio Fortino

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
## Not run: 
    file_operon_annot <- system.file("extdata", "1944.opr", package="CONDOP")
    file_genome_seq   <- system.file("extdata", "EC-k12-MG1655.fasta", package="CONDOP")
    data(ct1)
    data.in   <- pre.proc(file_genome_annot, file_operon_annot, "NC_000913",
                          list.cov.dat = list(ct1 = ct1)) 
    res.comap <- run.CONDOP(data.in = data.in, bkgExprCDS = 0.2, bkgExprIGR = 0.2, 
                            maxLenIGR = 150, find.ext = TRUE)                      

## End(Not run)

CONDOP documentation built on May 2, 2019, 1:26 p.m.