pre.proc: Prepare data inputs for the main function 'run.CONDOP()'.
In CONDOP: Condition-Dependent Operon Predictions

Description Usage Arguments Value Note Author(s) Examples

Load the annotation files and a list of count tables (or coverage vectors). Each count table is related to a specific experimental condition and it must contain two columns: fwd (coverage depth on the forward strand) and rev (coverage depth on the reverse strand). The annotations files are:

- GFF-like file, it can be downloaded from the NCBI genomes ftp directory, ftp://ftp.ncbi.nih.gov/genomes.

- DOOR-like file, it can be downloaded from http://csbl.bmb.uga.edu/DOOR/displayspecies.php.

- FASTA-like file, it can be downloaded from www.ncbi.nlm.nih.gov.

1
2
3

pre.proc(gff.file, door.op.file, fasta.file, list.cov.dat,
  remove.cov = list("rRNA"), log2.expr = TRUE, sw = 100,
  save.data.file = NULL, verbose = TRUE)

`gff.file`	A full local path indicating the GFF-like file to load <Gene annotations>.
`door.op.file`	A full local path indicating the DOOR-like file to load (DOOR-operon annotations).
`fasta.file`	A full local path indicating the FASTA-like file to load or a character string representing the accession number of the genome sequence to download.
`list.cov.dat`	List of count tables.
`remove.cov`	List of character values. Each charcater value corresponds to a specific type of annotated features. The coverage depth from those annotated feature will be removed. The default list contains "rRNA". The coverage depth of "rRNA" features will be removed.
`log2.expr`	Logical value indicating whether CONDOP will be using logged values of expression. The expression values are compiled in RPKM values. Default logical value is TRUE.
`sw`	Numeric value specifying the sliding window size. Default value is 100.
`save.data.file`	Character string naming a file. The file will contain the input for the CONDOP main process.
`verbose`	Indicate whether information about the process should be reported. Defaults to TRUE.

A list of data inputs for the main process run.CONDOP.

`genes.and.ops`	A merged dataframe containing information about genes/features and operons merged.
`gseq`	A character vector representing the genome sequence of the target organism.
`igr.pos`	A dataframe containing information about intergenic regions (IRGs) - forward (+) strand.
`igr.neg`	A dataframe containing information about intergenic regions (IRGs) - reverse (-) strand.
`tl.cds`	A list of dataframes containing the expression levels of annotated coding sequences (CDS regions). One dataframe for each count table.
`tl.igr.pos`	A list of dataframes containing the expression levels of intergenic sequences (IGR regions) - forward (+) strand. One dataframe for each count table.
`tl.igr.neg`	A list of dataframes containing the expression levels of intergenic sequences (IGR regions) - reverse (-) strand. One dataframe for each count table.
`sid.points`	A list of dataframes containing information about boundaries of transcriptionally active regions.
`cut.lhe`	A list of numeric vectors indicating the cut-off values to distinguish low expressed RNA-seq data from high expression data on the forward and reverse strands. One dataframe for each count table.

Use the pre.proc function before running run.CONDOP. You do not have to worry about how to make the input data structures for the the run.CONDOP function.

Vittorio Fortino

## Not run: 
    file_operon_annot <- system.file("extdata", "1944.opr", package="CONDOP")
    file_genome_seq   <- system.file("extdata", "EC-k12-MG1655.fasta", package="CONDOP")
    data(ct1)
    data.in <- pre.proc(file_genome_annot, file_operon_annot, "NC_000913", 
                        list.cov.dat = list(ct1 = ct1)) 

## End(Not run)