# Extremely fast linkage map construction for R/qtl objects using MSTmap.

### Description

Extremely fast linkage map construction for R/qtl objects utilizing the source code for MSTmap (see Wu et al., 2008). The construction includes linkage group clustering, marker ordering and genetic distance calculations.

### Usage

1 2 3 4 5 6 | ```
## S3 method for class 'cross'
mstmap(object, chr, id = "Genotype", bychr = TRUE,
suffix = "numeric", anchor = FALSE, dist.fun = "kosambi",
objective.fun = "COUNT", p.value = 1e-06, noMap.dist = 15,
noMap.size = 0, miss.thresh = 1, mvest.bc = FALSE, detectBadData =
FALSE, return.imputed = FALSE, trace = FALSE, ...)
``` |

### Arguments

`object` |
A |

`chr` |
A character string of linkage group names that require re-construction and/or optimal ordering of the markers they contain. (see Details). |

`id` |
The name of the column in |

`bychr` |
Logical value. For a given set of linkage groups defined by |

`suffix` |
Character string either |

`anchor` |
Logical value. The MSTmap algorithm does not respect the inputted marker
order of the linkage map required for construction. For a given set of
linkage groups defined by |

`dist.fun` |
Character string defining the distance function used for calculation of genetic distances. Options are "kosambi" and "haldane". Default is "kosambi". |

`objective.fun` |
Character string defining the objective function to be used when
constructing the map. Options are |

`p.value` |
Numerical value to specify the threshold to use when constructing
linkage groups. Defaults to |

`noMap.dist` |
Numerical value to specify the smallest genetic distance a set of
isolated markers can appear distinct from other linked markers. Isolated
markers will appear in their own linkage groups and will be of size
specified by |

`noMap.size` |
Numerical value to specify the maximum size of isolated marker linkage
groups that have been identified using |

`miss.thresh` |
Numerical value to specify the threshold proportion of missing marker scores allowable in each of the markers. Markers above this threshold will not be included in the linkage map. Default is 1. |

`mvest.bc` |
Logical value. If |

`detectBadData` |
Logical value. If |

`return.imputed` |
Logical value. If |

`trace` |
An automatic tracing facility. If |

`...` |
Currently ignored. |

### Details

The qtl cross object needs to inherit one of the allowable classes
`"bc","dh","riself","bcsft"`

. This provides a safeguard against
attempts to construct a map for more complex populations that can
exist in qtl. Users should be aware when doubled haploid
populations are read in using `read.cross()`

from the qtl
package they inherit the class `"bc"`

. Users can apply the class
`"dh"`

by simply changing the class of the object. For the purpose
of linkage map construction the classes `"bc"`

and `"dh"`

will
provide equivalent results.

MSTmap supports `"RILn"`

populations, where n is the number of generations
of selfing. Markers in these populations are required to be fully
informative i.e. contain 3 distinct allele types such as AA, BB for
parental homozygotes and AB for phase unknown heterozygotes.
If `read.cross`

is used to import the `"RILn"`

population the resultant
object will initially be given a class `"f2"`

. The level of selfing
would then have to be encoded into the object by applying one of the two conversion
functions available in the qtl package. For a
population that has been generated by selfing n times the conversion
function `convert2bcsft`

can be used by setting the arguments
`F.gen = n`

and `BC.gen = 0`

. Populations that are genuine
advanced RILs can be converted using the `convert2riself`

function.

This method function is designed to be an "all-in-one" function that
will allow you to construct linkage maps extremely fast in multiple
different ways from the supplied cross `object`

. Initially, the map
can be kept complete or a subset of selected linkage groups can be chosen
using the `chr`

argument. Setting `bychr = FALSE`

will
bulk the marker information for the selected linkage groups and, if
necessary, form new linkage groups and optimise the marker order within
each. Setting `bychr = TRUE`

will ensure that markers
are optimally ordered within each linkage group. This will also break
linkage groups depending on the p-value given in the call (see
below for details of the use of `p.value`

). If the
linkage map was initially subsetted, the linkage groups not involved in
the subset are returned to ensure the map is complete.

The algorithm allows an adjustment of the `p.value`

threshold for
clustering of markers to distinct linkage groups (see Wu et al.,
2008) and is highly dependent on the number of individuals in
the population. As the number of individuals increases the
`p.value`

threshold should be decreased accordingly. This may
require some trial and error to achieve desired results.
When `bychr = TRUE`

, established linkage groups may also split
depending on the `p.value`

given. To prevent this the `p.value`

threshold
may be increased to a desired value or the splitting may be prevented
altogether by supplying a value greater than one to this argument.

If `mvest.bc = TRUE`

and the population type is `"bc","dh","riself"`

then missing values are imputed before markers are clustered into
linkage groups. This is only a simple imputation that places a 0.5
probability of the missing observation being one allele or the other and
is used to assist the clustering algorithm when there is known to be high numbers of
missing observations between pairs of markers.

It should be highlighted that for population types
`"bc","dh","riself"`

, imputation of missing values occurs
regardless of the value of `mvest.bc`

. This is achieved using an EM algorithm that is
tightly coupled with marker ordering (see Wu et al., 2008). Initially
a marker order is obtained omitting missing marker scores and then
imputation is performed based on the underlying recombinant probabilities
of the flanking markers with the markers containing the missing
value. The recombinant probabilities are then recomputed and an update of
the pairwise distances are calculated. The ordering algorithm is then
run again and the complete process is repeated until
convergence. Note, the imputed probability matrix for the linkage map
being constructed is returned if `return.imputed = TRUE`

.

For populations `"bc","dh","riself"`

, if ```
detectBadData =
TRUE
```

the marker ordering algorithm also
includes the detection of genotyping errors. For any individual
genotype, the detection method is based on a weighted Euclidean metric
(see Wu et al., 2008) that is a function of the
recombination probabilities of all the markers with the marker containing
the suspicious observation. Any genotyping errors detected are set to
missing and the missing values are then imputed as part of the marker
ordering algorithm. Note, the detection of these errors and their
amendment can be returned in the imputed probability matrix if
`return.imputed = TRUE`

.

If `return.imputed = TRUE`

and the object has class
`"bc","dh","riself"`

then the marker probability matrix is
returned for the linkage groups that have been constructed using the
algorithm. Each linkage group is named identically to the linkage groups
of the map and contains an ordered `"map"`

element and a `"data"`

element consisting of marker probabilities of the A allele being
present (i.e. P(A) = 1, P(B) = 0). Both elements contain a
possibly reduced version of the marker set that includes all
non-colocating markers as well as the first marker of any set of
co-locating markers.

### Value

The function returns a cross object with an identical class
structure to the cross `object`

inputted. The object is a list
with usual components `"pheno"`

and `"geno"`

. If markers were
omitted for any reason during the construction, the object will have an
`"omit"`

component with all omitted markers in a collated
matrix. If `return.imputed = TRUE`

then the object will also
contain an `"imputed.geno"`

element.

### Author(s)

Julian Taylor, Dave Butler, Timothy Close, Yonghui Wu, Stefano Lonardi

### References

Y. Wu, P. Bhat, T.J. Close, S. Lonardi, Efficient and Accurate Construction of Genetic Linkage Maps from Minimum Spanning Tree of a Graph Plos Genetics, Volume 4, Issue 10, 2008.

### See Also

`mstmap.data.frame`

and `breakCross`

### Examples

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | ```
data(mapDH, package = "ASMap")
## bulking linkage groups and reconstructing entire linkage map
test1 <- mstmap(mapDH, bychr = FALSE, dist.fun = "kosambi", trace = FALSE)
pull.map(test1)
## one linkage group at a time (possibly break established linkage
## groups)
test2 <- mstmap(mapDH, bychr = TRUE, dist.fun = "kosambi", trace = FALSE)
pull.map(test2)
## one linkage group at a time (do not break established linkage groups)
test3 <- mstmap(mapDH, bychr = TRUE, dist.fun = "kosambi", p.value = 2, trace = FALSE)
pull.map(test3)
## impute before clustering and detect genotyping errors, pipe output to file
test4 <- mstmap(mapDH, bychr = FALSE, dist.fun = "kosambi", trace = TRUE,
mvest.bc = TRUE, detectBadData = TRUE)
pull.map(test4)
``` |