QC.mppData: Quality control for 'mppData' objects

View source: R/QC.mppData.R

QC.mppDataR Documentation

Quality control for mppData objects

Description

Perform different operations of quality control (QC) on the marker data of an mppData object.

Usage

QC.mppData(
  mppData,
  mk.miss = 0.1,
  gen.miss = 0.25,
  n.lim = 15,
  MAF.pop.lim = 0.05,
  MAF.cr.lim = NULL,
  MAF.cr.miss = TRUE,
  MAF.cr.lim2 = NULL,
  verbose = TRUE,
  n.cores = 1
)

Arguments

mppData

An object of class mppData formed with create.mppData.

mk.miss

Numeric maximum marker missing rate at the whole population level comprised between 0 and 1. Default = 0.1.

gen.miss

Numeric maximum genotype missing rate at the whole population level comprised between 0 and 1. Default = 0.25.

n.lim

Numeric value specifying the minimum cross size. Default = 15.

MAF.pop.lim

Numeric minimum marker minor allele frequency at the population level. Default = 0.05.

MAF.cr.lim

Numeric vector specifying the critical within cross MAF. Marker with a problematic segregation rate in at least one cross is either set as missing within the problematic cross (MAF.cr.miss = TRUE), or remove from the marker matrix (MAF.cr.miss = FALSE). For default value see details.

MAF.cr.miss

Logical value specifying if maker with a too low segregation rate within cross (MAF.cr.lim) should be put as missing (MAF.cr.miss = TRUE) or discarded (MAF.cr.miss = FALSE). Default = TRUE.

MAF.cr.lim2

Numeric. Alternative option for marker MAF filtering. Only markers segregating with a MAF larger than MAF.cr.lim2 in at least one cross will be kept for the analysis. Default = NULL.

verbose

Logical value indicating if the steps of the QC should be printed. Default = TRUE.

n.cores

Numeric. Specify here the number of cores you like to use. Default = 1.

Details

The different operations of the quality control are the following:

  1. Remove markers with more than two alleles.

  2. Remove markers that are monomorphic or fully missing in the parents.

  3. Remove markers with a missing rate higher than mk.miss.

  4. Remove genotypes with more missing markers than gen.miss.

  5. Remove crosses with less than n.lim genotypes.

  6. Keep only the most polymorphic marker when multiple markers map at the same position.

  7. Check marker minor allele frequency (MAF). Different strategy can be used to control marker MAF:

    A) A first possibility is to filter marker based on MAF at the whole population level using MAF.pop.lim, and/or on MAF within crosses using MAF.cr.lim.

    The user can give the its own vector of critical values for MAF within cross using MAF.cr.lim. By default, the within cross MAF values are defined by the following function of the cross-size n.ci: MAF(n.ci) = 0.5 if n.ci c [0, 10] and MAF(n.ci) = (4.5/n.ci) + 0.05 if n.ci > 10. This means that up to 10 genotypes, the critical within cross MAF is set to 50 decreases when the number of genotype increases until 5

    If the within cross MAF is below the limit in at least one cross, then marker scores of the problematic cross are either put as missing (MAF.cr.miss = TRUE) or the whole marker is discarded (MAF.cr.miss = FALSE). By default, MAF.cr.miss = TRUE which allows to include a larger number of markers and to cover a wider genetic diversity.

    B) An alternative is to select only markers that segregate in at least on cross at the MAF.cr.lim2 rate.

Value

a filtered mppData object containing the the same elements as create.mppData after filtering. It contains also the following new elements:

geno.id

Character vector of genotpes identifiers.

ped.mat

Four columns data.frame: 1) the type of genotype: "offspring" for the last genration and "founder" for the genotypes above the offspring in the pedigree; 2) the genotype indicator; 3-4) the parent 1 (2) of each line.

geno.par.clu

Parent marker matrix without monomorphic or completely missing markers.

haplo.map

Genetic map corresponding to the list of marker of the geno.par.clu object.

parents

List of parents.

n.cr

Number of crosses.

n.par

Number of parents.

rem.mk

Vector of markers that have been removed.

rem.geno

Vector of genotypes that have been removed.

Author(s)

Vincent Garin

See Also

create.mppData

Examples


data(mppData_init)

mppData <- QC.mppData(mppData = mppData_init, n.lim = 15, MAF.pop.lim = 0.05,
                      MAF.cr.miss = TRUE, mk.miss = 0.1,
                      gen.miss = 0.25, verbose = TRUE)      


vincentgarin/mppR documentation built on March 13, 2024, 7:30 p.m.