clean_genoprob: Clean genotype probabilities

View source: R/clean_genoprob.R

clean_genoprobR Documentation

Clean genotype probabilities

Description

Clean up genotype probabilities by setting small values to 0 and for a genotype column where the maximum value is rather small, set all values in that column to 0.

Usage

clean_genoprob(
  object,
  value_threshold = 0.000001,
  column_threshold = 0.01,
  ind = NULL,
  cores = 1,
  ...
)

## S3 method for class 'calc_genoprob'
clean(
  object,
  value_threshold = 0.000001,
  column_threshold = 0.01,
  ind = NULL,
  cores = 1,
  ...
)

Arguments

object

Genotype probabilities as calculated by calc_genoprob().

value_threshold

Probabilities below this value will be set to 0.

column_threshold

For genotype columns where the maximum value is below this threshold, all values will be set to 0. This must be less than 1/k where k is the number of genotypes.

ind

Optional vector of individuals (logical, numeric, or character). If provided, only the genotype probabilities for these individuals will be cleaned, though the full set will be returned.

cores

Number of CPU cores to use, for parallel calculations. (If 0, use parallel::detectCores().) Alternatively, this can be links to a set of cluster sockets, as produced by parallel::makeCluster().

...

Ignored at this point.

Details

In cases where a particular genotype is largely absent, scan1coef() and fit1() can give unstable estimates of the genotype effects. Cleaning up the genotype probabilities by setting small values to 0 helps to ensure that such effects get set to NA.

At each position and for each genotype column, we find the maximum probability across individuals. If that maximum is < column_threshold, all values in that genotype column at that position are set to 0.

In addition, any genotype probabilities that are < value_threshold (generally < column_threshold) are set to 0.

The probabilities are then re-scaled so that the probabilities for each individual at each position sum to 1.

If ind is provided, the function is applied only to the designated subset of individuals. This may be useful when only a subset of individuals have been phenotyped, as you may want to zero out genotype columns where that subset of individuals has only negligible probability values.

Value

A cleaned version of the input genotype probabilities object, object.

Examples

iron <- read_cross2(system.file("extdata", "iron.zip", package="qtl2"))


# calculate genotype probabilities
probs <- calc_genoprob(iron, error_prob=0.002)

# clean the genotype probabilities
# (doesn't really do anything in this case, because there are no small but non-zero values)
probs_clean <- clean(probs)

# clean only the females' genotype probabilities
probs_cleanf <- clean(probs, ind=names(iron$is_female)[iron$is_female])

rqtl/qtl2 documentation built on Nov. 28, 2024, 4:57 a.m.