smooth_descent: Smooth Descent wrapper

View source: R/smooth_wrapper.R

smooth_descentR Documentation

Smooth Descent wrapper

Description

This function applies the IBD calculation, IBD prediction, error estimation and genotype prediction functions according to the Smooth Descent algorithm. Provided with a genotype matrix, a parental homologue assignment matrix and a genetic map, it is able to estimate putative genotyping errors as well as recombination counts and imputed genotypes with less errors.

Usage

smooth_descent(
  geno,
  homologue,
  map,
  ploidy = 2,
  p1name = NULL,
  p2name = NULL,
  prediction_interval = 10,
  prediction_threshold = 0.8,
  prediction_points = NULL,
  error_threshold = 0.8,
  non_inf = c(0.3, 0.7),
  verbose = T,
  obs.method = "naive",
  pred.method = "prediction",
  hmm.error = 0.01
)

Arguments

geno

matrix with markers on the rows, individuals on the columns. Row names are expected.

homologue

matrix with markers on the rows, homologue names on the columns. Rownames and columnames expected.

map

data.frame with at least columns "marker" and "position". If it is not specified, a map will be estimated from the uncorrected genotype data with polymapR.

ploidy

numeric indicating the ploidy. Both parents must be of the same ploidy, and it is assumed that "homologue" has 2*ploidy columns.

p1name

character, name of the first parent. Must be present in the geno columnames. If it's not specified it will be taken as the name of the first column.

p2name

character, name of the second parent. Must be present in the geno columnames. If it's not specified it will be taken as the name of the second column.

prediction_interval

numeric, interval to be used during the IBD prediction step. It should be specified in the same units as the "position" in the map.

prediction_threshold

float, probability threshold for imputing new genotypes. All new genotypes with a probability under this threshold will be considered uncertain. Defaults to 0.8.

prediction_points

numeric, number of points to use for IBD prediction. If NULL, all points in map$position are used, otherwise n equally spaced points are used. Greatly improves efficiency if the number of markers is very large.

error_threshold

numeric, threshold over which a marker is considered erroneous. Usually 0.8 should be good enough to be sensitive but stringent (not have false positives)

non_inf

numeric, lower and upper probability boundaries to consider an IBD probability non-informative (if they fall within the threshold they will be ignored during prediction). Defaults to 0.3 - 0.7. Symmetrical boundaries are recommended but not necessary.

verbose

logical, should smooth descent report the steps it takes?

obs.method

character, either "naive" or "heuristic" (or substrings). This parameter allows to switch between using the IBD calculation (for observed IBDs) described in the Smooth Descent paper, or the heuristic method from 'polyqtlR'. However, our research has shown better results with the naive method.

pred.method

character, either "prediction" or "hmm" (or substrings). This parameter allows to switch between using the IBD calculation (for predicted IBDs) between the weighted average method or the Hidden Markov model implemented in 'polyqtlR'. Our research shows better results in polyploids with the HMM, although for high marker densities the weighted average method is faster (specially if 'prediction_points' is used)

Value

A list containing the following items: * obsIBD: list of observed IBD matrices (marker x individual) for each parental homologue * predIBD: list of predicted IBD matrices (marker x individual) for each parental homologue * oldmap: data.frame containing the original map * error: list of error matrices (marker x individual) for each parental homologue * newIBD: list of observed IBD matrices of the corrected genotypes. * newgeno: matrix containing the new genotypes. * rec: list containing the recombination counts using the observed IBD ('obs'), the predicted IBD ('pred') and the updated IBD ('new'), using the given map order. For more information see 'rec_count()'.

Examples


data("genotype")
data("homologue")
data("map")

res <- smooth_descent(geno,hom,map, ploidy = 2, p1name = "P1", p2name = "P2")



Alethere/SmoothDescent documentation built on Oct. 21, 2023, 7:11 a.m.