clean_genoprob: Clean genotype probabilities
In qtl2: Quantitative Trait Locus Mapping in Experimental Crosses

clean_genoprob

R Documentation

Clean genotype probabilities

Description

Clean up genotype probabilities by setting small values to 0 and for a genotype column where the maximum value is rather small, set all values in that column to 0.

Usage

clean_genoprob(
  object,
  value_threshold = 0.000001,
  column_threshold = 0.01,
  ind = NULL,
  cores = 1,
  ...
)

## S3 method for class 'calc_genoprob'
clean(
  object,
  value_threshold = 0.000001,
  column_threshold = 0.01,
  ind = NULL,
  cores = 1,
  ...
)

Arguments

`object`	Genotype probabilities as calculated by `calc_genoprob()`.
`value_threshold`	Probabilities below this value will be set to 0.
`column_threshold`	For genotype columns where the maximum value is below this threshold, all values will be set to 0. This must be less than `1/k` where `k` is the number of genotypes.
`ind`	Optional vector of individuals (logical, numeric, or character). If provided, only the genotype probabilities for these individuals will be cleaned, though the full set will be returned.
`cores`	Number of CPU cores to use, for parallel calculations. (If `0`, use `parallel::detectCores()`.) Alternatively, this can be links to a set of cluster sockets, as produced by `parallel::makeCluster()`.
`...`	Ignored at this point.

Details

In cases where a particular genotype is largely absent, scan1coef() and fit1() can give unstable estimates of the genotype effects. Cleaning up the genotype probabilities by setting small values to 0 helps to ensure that such effects get set to NA.

At each position and for each genotype column, we find the maximum probability across individuals. If that maximum is < column_threshold, all values in that genotype column at that position are set to 0.

In addition, any genotype probabilities that are < value_threshold (generally < column_threshold) are set to 0.

The probabilities are then re-scaled so that the probabilities for each individual at each position sum to 1.

If ind is provided, the function is applied only to the designated subset of individuals. This may be useful when only a subset of individuals have been phenotyped, as you may want to zero out genotype columns where that subset of individuals has only negligible probability values.

Value

A cleaned version of the input genotype probabilities object, object.

Examples

iron <- read_cross2(system.file("extdata", "iron.zip", package="qtl2"))


# calculate genotype probabilities
probs <- calc_genoprob(iron, error_prob=0.002)

# clean the genotype probabilities
# (doesn't really do anything in this case, because there are no small but non-zero values)
probs_clean <- clean(probs)

# clean only the females' genotype probabilities
probs_cleanf <- clean(probs, ind=names(iron$is_female)[iron$is_female])

qtl2 documentation built on June 8, 2025, 10:25 a.m.