genome2PQtree: Convert a one-dimensional genome map into a two-dimensional...

Description Usage Arguments Details Value See Also Examples

View source: R/genome2PQtree.R

Description

Convert a one-dimensional genome map into a two-dimensional PQ-structure that can be used as compgenome input for the functions computeRearrs, summarizeBlocks, and genomeRearrPlot

Usage

1
genome2PQtree(genomemap)

Arguments

genomemap

Data frame representing the genome map to be converted, containing the mandatory columns $marker, $scaff, $start, $end, and $strand, and optional further columns. Markers need to be ordered by their map position.

Details

genomemap must contain the mandatory columns $marker (a character or integer vector that gives the IDs of markers), $scaff (a character vector that gives the ID of the genome segment of origin of each marker), $start and $end (numeric vectors that specify the location of each marker on its genome segment), and $strand (a vector of "+" and "-" characters that indicate the reading direction of each marker). Additional columns are ignored and may store custom information. Markers need to be ordered by their map position within each genome segment, for example by running the orderGenomeMap function.

Important: If the converted genome map is used as compgenome input for the function computeRearrs, it is crucial that all genome segments in the $scaff column of genomemap represent contiguous sets of genetic markers. Genome segments that are (potentially) overlapping, such as minor scaffolds or contigs that were not assembled into chromosomes and might in fact be part of assembled chromosomes or enclosed in other scaffolds, need to be excluded from genomemap prior to its conversion.

Value

A data frame encoding the marker order in genomemap as a two-dimensional PQ-structure (i.e., in PQ-tree format).

IDs in the $car column of the output are assigned according to the order of genome segments as they appear in the $scaff column of genomemap. Markers that are NA in the genome map are excluded from the output.

For additional details on the output format see the description of the "compgenome" class in the Details section of the checkInfile function, or the package vignette.

The unambiguously ordered genome segments in the one-dimensional genome map genomemap can be seen as a subclass of PQ-trees, where each genome segment is encoded by a single Q-node that only contains leaves as children. Accordingly, the returned PQ-structure has exactly five columns: $marker, $orientation, $car, one column for node type (always "Q"), and one for node element (ranging from 1 to the number of non-NA markers within a genome segment).

See Also

orderGenomeMap, checkInfile, computeRearrs, summarizeBlocks, genomeRearrPlot.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
## Not run: 

## Exclude potentially overlapping minor scaffolds from genome map:
SIM_markers_chr <- SIM_markers[is.element(SIM_markers$scaff,
                                          c("2L", "2R", "3L", "3R", "4", "X")), ]

## Convert genome map into PQ-structure:
SIM_compgenome <- genome2PQtree(SIM_markers_chr)

## Print a translation between names of genome segments and CAR IDs:
head(data.frame(chr = unique(SIM_markers_chr$scaff),
                car = 1:length(unique(SIM_markers_chr$scaff)),
                stringsAsFactors = FALSE))

## End(Not run)

dorolin/rearrvisr documentation built on Aug. 6, 2020, 1:32 a.m.