Description Usage Arguments Details Value References See Also Examples
View source: R/summarizeBlocks.R
For each synteny block, summarize rearrangements and information on the alignment between the focal genome and the compared genome
1 | summarizeBlocks(SYNT, focalgenome, compgenome, ordfocal)
|
SYNT |
A list of matrices that store data on different classes of
rearrangements and additional information. |
focalgenome |
Data frame representing the focal genome, containing the
mandatory columns |
compgenome |
Data frame representing the compared genome (e.g., an
ancestral genome reconstruction, or an extant genome), with the first three
columns |
ordfocal |
Character vector with the IDs of the focal genome segments
that will be summarized. Have to match (a subset of) IDs in
|
focalgenome
must contain the column $marker
, a vector of
either characters or integers with unique ortholog IDs that can be matched
to the values in the rownames of SYNT
and the $marker
column
of compgenome
. Values can be NA
for markers that have no
ortholog. $scaff
must be a character vector giving the name of the
focal genome segment (i.e., chromosome or scaffold) of origin of each
marker. $start
and $end
must be numeric vectors giving the
location of each marker on its focal genome segment. $strand
must be
a vector of "+"
and "-"
characters giving the reading
direction of each marker. Additional columns are ignored and may store
custom information, such as marker names. Markers need to be ordered by
their map position within each focal genome segment, for example by running
the orderGenomeMap
function. focalgenome
may contain
additional rows that were absent when running the
computeRearrs
function. However, all markers present in
SYNT
need to be contained in focalgenome
, with the subset of
shared markers being in the same order.
A list of lists that summarizes the alignment between the focal
genome and each PQ-tree, and records whether synteny blocks are part
of different classes of rearrangements. The top-level list elements are
focal genome segments, and the lower-level list elements contain
information on synteny blocks and rearrangements for each focal genome
segment. For details on PQ-trees see the description of the
"compgenome"
class in the Details section of the
checkInfile
function, Booth & Lueker 1976, Chauve & Tannier
2008, or the package vignette.
The names of the top-level list elements correspond to the strings in
ordfocal
. Each list element is itself a list containing the data
frame $blocks
and five numeric matrices $NM1
, $NM2
,
$SM
, $IV
, and $IVsm
, described below. In all six
list elements, each synteny block is represented by a row. Note that
separate blocks are also generated when the hierarchical structure of the
underlying PQ-tree changes, therefore not all independent rows are
caused by a rearrangement.
$blocks
contains information on the alignment and structure of each
PQ-tree. The columns $blocks$start
and $blocks$end
give the start and end positions of the synteny block in SYNT
(positions start at 1
separately for each focal genome segment).
$blocks$markerS
and $blocks$markerE
give the marker IDs of
the first and last marker per block. $blocks$car
gives the ID of the
CAR. Nine columns per hierarchy level describe the structure of each
PQ-tree and its alignment to the focal genome. Hierarchy levels of
the PQ-trees are indicated by suffixes {1, 2, ...}
.
$blocks$type
gives the node type. $blocks$elemS
and
$blocks$elemE
give the first and last ID of the node elements per
block. They correspond to the IDs in the odd columns of compgenome
(note that some IDs within blocks or in-between might be missing when
markers in the compared genome are absent from the focal genome).
$blocks$node
indicates whether the block contains PQ-tree
nodes (value is 1
) or only leaf elements (value is 0
). The
columns $blocks$nodeori
, $blocks$subnode
,
$blocks$blockid
, $blocks$blockori
, and $blocks$premask
summarize for each block the values in the list elements of SYNT
with the corresponding names (described in the Value section in the
documentation of the computeRearrs
function). The column
$blocks$nodeori1
, for example, summarizes for each block the values
in the second column (i.e., the first node level) of SYNT$nodeori
.
The numeric matrices $NM1
, $NM2
, $SM
, $IV
,
and $IVsm
indicate whether blocks are part of different classes of
rearrangements. $NM1
stores T
ransL
ocations between
CARs B
etween focal S
egments; $NM2
stores
T
ransL
ocations between CARs W
ithin focal
S
egments; $SM
stores T
ransL
ocations within
CARs W
ithin focal S
egments; $IV
and $IVsm
store
I
nV
ersions within CARs within focal segments. In $IV
,
blocks that are part of a multi-marker inversion are tagged with 1
,
while in $IVsm
, integers >0
indicate the positions of
single-marker inversions (i.e., markers with switched orientation) within
their blocks. Each rearrangement is represented by a separate column, and
blocks that are part of a rearrangement have a tag value of >0
. Note
that some columns in $NM2
or $SM
may be duplicated due to
the functioning of the underlying algorithm in computeRearrs
;
although corresponding to the same rearrangement, these duplicated columns
are nevertheless included for completeness. By default these columns will
not be visualized with the genomeRearrPlot
function. If no
rearrangements were detected for a certain class, the matrix has zero
columns. See the package vignette or the Value section in the documentation
of the computeRearrs
function for details on the meaning of
different tag values in these matrices. Note that if SYNT
has been
filtered with the filterRearrs
function, only the above
matrices will be affected, while the information in $blocks
will
remain unchanged.
The returned data can be visualized with the genomeRearrPlot
function.
Booth, K.S. & Lueker, G.S. (1976). Testing for the consecutive ones property, interval graphs, and graph planarity using PQ-Tree algorithms. Journal of Computer and System Sciences, 13, 335–379. doi: 10.1016/S0022-0000(76)80045-1.
Chauve, C. & Tannier, E. (2008). A methodological framework for the reconstruction of contiguous regions of ancestral genomes and its application to mammalian genomes. PLOS Computational Biology, 4, e1000234. doi: 10.1371/journal.pcbi.1000234.
checkInfile
, computeRearrs
,
filterRearrs
, genomeRearrPlot
.
1 2 3 4 5 6 7 8 9 10 11 | SYNT <- computeRearrs(TOY24_focalgenome, TOY24_compgenome, doubled = TRUE)
BLOCKS <- summarizeBlocks(SYNT, TOY24_focalgenome, TOY24_compgenome,
c("1","2","3"))
## Not run:
## show summary for first focal genome segment
BLOCKS[[1]]
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.