smooth_map: Smooth Mapping

View source: R/smooth_wrapper.R

smooth_mapR Documentation

Smooth Mapping

Description

This function allows to apply one single interation of the Smooth Descent algorithm, including a re-mapping step using new genotypes. Optionally, the preliminary map can also be estimated instead of being provided. Both mapping procedures are performed with packages 'polymapR' and 'MDSMap', which perform multi-dimensional scaling mapping. For usage of the Smooth Descent algorithm with a different map algorithm see 'smooth_descent()', which performs the genotype correction without re-mapping.

Usage

smooth_map(
  geno,
  homologue,
  map = NULL,
  ploidy = 2,
  p1name = NULL,
  p2name = NULL,
  prediction_interval = 10,
  prediction_threshold = 0.8,
  prediction_points = NULL,
  error_threshold = 0.8,
  ncores = 1,
  mapping_ndim = 2,
  estimate_premap = F,
  max_distance = 10,
  non_inf = c(0.3, 0.7),
  verbose = T,
  obs.method = "naive",
  pred.method = "prediction",
  hmm.error = 0.01
)

Arguments

geno

matrix with markers on the rows, individuals on the columns. Rownames and columnames expected.

homologue

matrix with markers on the rows, homologue names on the columns. Rownames and columnames expected.

map

optionally, data.frame with at least columns "marker" and "position". If it is not specified, a map will be estimated from the uncorrected genotype data with polymapR.

ploidy

numeric indicating the ploidy. Both parents must be of the same ploidy, and it is assumed that "homologue" has 2*ploidy columns.

p1name

character, name of the first parent. Must be present in the geno columnames. If it's not specified it will be taken as the name of the first column.

p2name

character, name of the second parent. Must be present in the geno columnames. If it's not specified it will be taken as the name of the second column.

prediction_interval

numeric, interval to be used during the IBD prediction step. It should be specified in the same units as the "position" in the map.

prediction_threshold

float, probability threshold for imputing new genotypes. All new genotypes with a probability under this threshold will be considered uncertain. Defaults to 0.8.

prediction_points

numeric, number of points to use for IBD prediction. If NULL, all points in map$position are used, otherwise n equally spaced points are used. Greatly improves efficiency if the number of markers is very large.

error_threshold

numeric, threshold over which a marker is considered erroneous. Usually 0.8 should be good enough to be sensitive but stringent (not have false positives)

ncores

number of cores to use for linkage estimation.

mapping_ndim

2 or 3, number of dimensions to use for multi-dimensional mapping

estimate_premap

logical, whether to use polymapR to estimate a preliminary map.

max_distance

numeric, markers that have near neighbours will be eliminated. This parameter defines the maximum neighbour distance allowed. A warning will be issued if some markers are eliminated.

non_inf

numeric, lower and upper probability boundaries to consider an IBD probability non-informative (if they fall within the threshold they will be ignored during prediction). Defaults to 0.3 - 0.7. Symmetrical boundaries are recommended but not necessary.

verbose

logical, should smooth descent report the steps it takes?

obs.method

character, either "naive" or "heuristic" (or substrings). This parameter allows to switch between using the IBD calculation (for observed IBDs) described in the Smooth Descent paper, or the heuristic method from 'polyqtlR'. However, our research has shown better results with the naive method.

pred.method

character, either "prediction" or "hmm" (or substrings). This parameter allows to switch between using the IBD calculation (for predicted IBDs) between the weighted average method or the Hidden Markov model implemented in 'polyqtlR'. Our research shows better results in polyploids with the HMM, although for high marker densities the weighted average method is faster (specially if 'prediction_points' is used)

Value

list containing the following items: * obsIBD: list of observed IBD matrices (marker x individual) for each parental homologue * predIBD: list of predicted IBD matrices (marker x individual) for each parental homologue * oldmap: data.frame containing the original map * error: list of error matrices (marker x individual) for each parental homologue * newIBD: list of observed IBD matrices of the corrected genotypes. * newmap: data.frame containing the new updated map. * newgeno: matrix containing the new genotypes. * rec: list containing the recombination counts using the observed IBD ('obs'), the predicted IBD ('pred') and the updated IBD ('new'). 'obs' and 'pred' use the old map order while 'new' uses the re-estimated map. For more information see 'rec_count()'. * recdist: data.frame containing pair-wise recombination and distance between markers, useful to plot using 'recdist_plot()' * r2: R-squared parameter of pair-wise recombination and final map distance. A higher value indicates a better newmap. * tau: reordering parameter tau (Kendall's rank correlation). Obtained with 'reorder_tau()' * eliminated: markers eliminated due to a large average neighbour distance, they tend to be problematic when re-mapping.

Examples

## Not run: 
data("genotype")
data("homologue")
data("map")

res <- smooth_descent(geno,hom,map, ploidy = 2, p1name = "P1", p2name = "P2",
   estimate_premap = F, mapping_ndim = 3, ncores = 1)

## End(Not run)

Alethere/SmoothDescent documentation built on Oct. 21, 2023, 7:11 a.m.