checkInfile: Check input files

Description Usage Arguments Details Value References See Also Examples

View source: R/checkInfile.R

Description

Check input data to ensure that file format is correct

Usage

1
checkInfile(myobject, myclass, checkorder = NULL)

Arguments

myobject

Data object to be tested.

myclass

Name of data class. Can be one of the strings "focalgenome", "compgenome", "SYNT", or "BLOCKS".

checkorder

Logical. If TRUE, the order of markers in class "focalgenome" or "compgenome" is checked; if FALSE, only the file format is verified. Ignored for classes "SYNT" and "BLOCKS".

Details

Objects of the class "focalgenome" must contain the column $marker, a vector of either characters or integers giving unique IDs for orthologs. Values can be NA for markers that have no ortholog. $scaff must be a character vector giving the name of the focal genome segment (i.e., chromosome or scaffold) of origin of each marker. $start and $end must be numeric vectors giving the location of each marker on its focal genome segment. $strand must be a vector of "+" and "-" characters giving the reading direction of each marker. Additional columns are ignored and may store custom information, such as marker names. See Examples below for the focalgenome format.

Objects of the class "compgenome" must contain the column $marker, a vector of either characters or integers giving unique IDs for orthologs. $orientation must be a vector of "+" and "-" characters giving the reading direction of each marker in the compared genome. $car must be an integer vector giving the location of each marker on its compared genome segment (i.e., Contiguous Ancestral Region, or CAR), analogous to contiguous sets of genetic markers on a chromosome, scaffold, or contig. Each CAR is represented by a PQ-tree (Booth & Lueker 1976; Chauve & Tannier 2008). The PQ structure of each CAR is defined by additional columns (at least two) that have to alternate between character vectors of node type ("P", "Q", or NA) in even columns, and integer vectors of node elements in odd columns (missing values are permitted past the fifth column). Every set of node type and node element column reflects the hierarchical structure of each PQ-tree, with the rightmost columns representing the lowest level of the hierarchy. P-nodes contain contiguous markers and/or nodes in arbitrary order, while Q-nodes contain contiguous markers and/or nodes in fixed order (including their reversal). For additional details on PQ-trees see Booth & Lueker 1976, Chauve & Tannier 2008, or the package vignette. See Examples below for the compgenome format.

Objects of the class "SYNT" must be a list of matrices generated by the computeRearrs function. The list stores data on different classes of rearrangements and additional information.

Objects of the class "BLOCKS" must be a list of lists generated by the summarizeBlocks function. The top-level list elements of the "BLOCKS" object are focal genome segments, and the lower-level list elements contain information on synteny blocks and rearrangements for each focal genome segment.

Value

Returns an error message when a problem has been detected, or nothing otherwise.

References

Booth, K.S. & Lueker, G.S. (1976). Testing for the consecutive ones property, interval graphs, and graph planarity using PQ-Tree algorithms. Journal of Computer and System Sciences, 13, 335–379. doi: 10.1016/S0022-0000(76)80045-1.

Chauve, C. & Tannier, E. (2008). A methodological framework for the reconstruction of contiguous regions of ancestral genomes and its application to mammalian genomes. PLOS Computational Biology, 4, e1000234. doi: 10.1371/journal.pcbi.1000234.

See Also

computeRearrs, summarizeBlocks.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
checkInfile(TOY24_focalgenome, "focalgenome", checkorder = TRUE)

## focalgenome format:
TOY24_focalgenome

## compgenome format:
TOY24_compgenome

## Not run: 

## markers not ordered:
myorder <- sample(1:nrow(TOY24_focalgenome))
checkInfile(TOY24_focalgenome[myorder, ], "focalgenome", checkorder = TRUE)

## End(Not run)

dorolin/rearrvisr documentation built on Aug. 6, 2020, 1:32 a.m.