filterRearrs: Filter Rearrangements

Description Usage Arguments Details Value See Also Examples

View source: R/filterRearrs.R

Description

Remove rearrangements that comprise less than a minimum or more than a maximum number of markers

Usage

1
2
filterRearrs(SYNT, focalgenome, filterMin = c(NA, NA, NA, NA),
  filterMax = c(NA, NA, NA, NA))

Arguments

SYNT

A list of matrices that store data on different classes of rearrangements and additional information. SYNT must have been generated with the computeRearrs function.

focalgenome

Data frame representing the focal genome, containing the mandatory columns $marker, $scaff, $start, $end, and $strand, and optional further columns. Markers need to be ordered by their map position.

filterMin

A numerical vector of the form c(nm1, nm2, sm, iv) that specifies the minimum number of markers a rearrangement has to comprise to be retained. nm1 is the minimum number of markers in SYNT$NM1, nm2 is the minimum number of markers in SYNT$NM2, sm is the minimum number of markers in SYNT$SM, and iv is the minimum number of markers in SYNT$IV.

filterMax

A numerical vector of the form c(nm1, nm2, sm, iv) that specifies the maximum number of markers a rearrangement is allowed to comprise to be retained. nm1 is the maximum number of markers in SYNT$NM1, nm2 is the maximum number of markers in SYNT$NM2, sm is the maximum number of markers in SYNT$SM, and iv is the maximum number of markers in SYNT$IV.

Details

Parameters SYNT and focalgenome need to be specified.

focalgenome must contain the column $marker, a vector of either characters or integers with unique ortholog IDs that can be matched to the values in the rownames of SYNT. Values can be NA for markers that have no ortholog. $scaff must be a character vector giving the name of the focal genome segment (e.g., chromosome or scaffold) of origin of each marker. $start and $end must be numeric vectors giving the location of each marker on its focal genome segment. $strand must be a vector of "+" and "-" characters giving the reading direction of each marker. Additional columns are ignored and may store custom information, such as marker names. Markers need to be ordered by their map position within each focal genome segment, for example by running the orderGenomeMap function. focalgenome may contain additional rows that were absent when running the computeRearrs function. However, all markers present in SYNT need to be contained in focalgenome, with the subset of shared markers being in the same order.

Rearrangements are stored in SYNT and include the following rearrangement classes: NM1 are class I nonsyntenic moves; NM2 are class II nonsyntenic moves; SM are syntenic moves; IV are inversions.

Value

A filtered version of SYNT. An additional list element $filter is created that specifies the applied filter.

Note that for rearrangements that have more than one component, only the component that falls in the specified filter range is removed. This may result in an overestimation of the number of breakpoints when a filtered version of SYNT is used as input for the summarizeRearrs function.

See Also

computeRearrs, genomeImagePlot, summarizeRearrs.

Examples

1
2
3
4
SYNT <- computeRearrs(TOY24_focalgenome, TOY24_compgenome, doubled = TRUE)

## only retain inversions comprising at least two markers
SYNT_filt<-filterRearrs(SYNT, TOY24_focalgenome, filterMin = c(0, 0, 0, 2))

dorolin/rearrvisr documentation built on Aug. 6, 2020, 1:32 a.m.