orderGenomeMap: Order genome map

Description Usage Arguments Details Value Examples

View source: R/orderGenomeMap.R

Description

Order a genome map by genome segments and by the position of markers within genome segments

Usage

1
2
orderGenomeMap(genomemap, ordnames, partial = 0, ordpfx = "",
  ordsfx = "", sortby = "size")

Arguments

genomemap

Data frame representing the genome map to be ordered, containing the mandatory columns $marker, $scaff, $start, $end, and $strand, and optional further columns.

ordnames

Character vector with the names of genome segments (i.e., chromosomes or scaffolds) to which the genomemap will be sorted. The IDs in the column $scaff of genomemap will be matched to the names in ordnames, and sorted according to their appearance in ordnames.

partial

Integer of value 0 or 1. Indicates whether IDs in the column $scaff of genomemap have to match exactly (partial = 0) or partially (partial = 1) to the the names in ordnames.

ordpfx

String that is prefix to the names in ordnames, allowing additional matches.

ordsfx

String that is suffix to the names in ordnames, thereby restricting matches. Only relevant when partial = 1.

sortby

String indicating whether genome segments that do not have a unique match to the names in ordnames will be sorted by their size (sortby = "size") or by their name (sortby = "name").

Details

genomemap must contain the mandatory columns $marker (a character or integer vector that gives the IDs of markers), $scaff (a character vector that gives the ID of the genome segment of origin of each marker), $start and $end (numeric vectors that specify the location of each marker on its genome segment), and $strand (a vector of "+" and "-" characters that indicate the reading direction of each marker). Additional columns are ignored and may store custom information.

If partial = 0, only IDs in the column $scaff of genomemap will be considered that either match exactly to the names in ordnames, or that match exactly to the combined string of ordpfx and ordnames. ordsfx is ignored.

If partial = 1, all IDs in the column $scaff of genomemap will be considered that either start with the combined string of ordnames and ordsfx, or that start with the combined string of ordpfx, ordnames, and ordsfx.

If more than one ID in the column $scaff of genomemap matches (partially) to a specified genome segment name, matching genome segments will be sorted by their size (if sortby = "size") or by their name (if sortby = "name"). All genome segments without any match will similarly be sorted by their size or name (as specified by sortby) and appended to the end of the ordered genome map.

Value

A data frame containing an ordered version of genomemap. The IDs in column $scaff in genomemap are sorted according to their appearance in ordnames. Markers within each genome segment are sorted by their map position (i.e., the midpoint between positions given by the columns $start and $end in genomemap).

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
## Not run: 

## specify genome segment names that should appear at the top of
## the genome map, and sort remaining genome segments by their size:
SIM_ord1<-orderGenomeMap(SIM_markers, ordnames = c("2", "3", "X"),
                         ordpfx = "chr", partial = 1, sortby = "size")
head(unique(SIM_ord1$scaff), n = 20L)

## sort all genome segments by name:
SIM_ord2<-orderGenomeMap(SIM_markers, ordnames = "all", sortby = "name")
head(unique(SIM_ord2$scaff), n = 20L)
## ordnames = "all" is used as a non-matching dummy name

## only sort map positions, keeping original order of genome segments:
SIM_ord3<-orderGenomeMap(SIM_markers, ordnames = unique(SIM_markers$scaff))
head(unique(SIM_ord3$scaff), n = 20L)

## End(Not run)

dorolin/rearrvisr documentation built on Aug. 6, 2020, 1:32 a.m.