read_geno: Read marker genotype data

View source: R/read_geno.R

read_genoR Documentation

Read marker genotype data

Description

Read marker genotype data

Usage

read_geno(
  filename,
  ploidy,
  map,
  min.minor.allele = 5,
  w = 1e-05,
  ped = NULL,
  dominance = FALSE,
  pop.file = NULL
)

Arguments

filename

Name of CSV file with marker allele dosage

ploidy

2,4,6,etc. (even numbers)

map

TRUE/FALSE

min.minor.allele

threshold for marker filtering (see Details)

w

blending parameter (see Details)

ped

optional, pedigree data frame with 3 or 4 columns (see Details)

dominance

TRUE/FALSE whether to include dominance covariance (see Details)

pop.file

CSV file defining populations

Details

When map=TRUE, first three columns of the file are marker, chrom, position. When map=FALSE, the first column is marker. Subsequent columns contain the allele dosage for individuals/clones, coded 0,1,2,...ploidy (fractional values are allowed). The input file for diploids can also be coded using -1,0,1 (fractional values allowed). Additive coefficients are computed by subtracting the population mean from each marker, and the additive (genomic) relationship matrix is computed as G = tcrossprod(coeff)/scale. The scale parameter ensures the mean of the diagonal elements of G equals 1 under panmictic equilibrium. Missing genotype data is replaced with the population mean.

G can be blended with the pedigree relationship matrix (A) by providing a pedigree data frame in ped and blending parameter w. The blended relationship matrix is H = (1-w)G + wA. The first three columns of ped are id, parent1, parent2. Missing parents must be coded NA. An optional fourth column in binary (0/1) format can be used to indicate which ungenotyped individuals should be included in the H matrix, but this option cannot be combined with dominance. If there is no fourth column, only genotyped individuals are included. If a vector of w values is provided, the function returns a list of class_geno objects.

If the A matrix is not used, then G is blended with the identity matrix (times the mean diagonal of G) to improve numerical conditioning for matrix inversion. The default for w is 1e-5, which is somewhat arbitrary and based on tests with the vignette dataset. The D matrix is also blended with the identity matrix using 1e-5 for numerical conditioning.

When dominance=FALSE, non-additive effects are captured using a residual genetic effect, with zero covariance. If dominance=TRUE, a (digenic) dominance covariance matrix is used instead.

The argument min.minor.allele specifies the minimum number of individuals that must contain the minor allele. Markers that do not meet this threshold are discarded.

Optional argument pop.file gives the name of a CSV file with two columns: id,pop. If the populations have different ploidy, this is indicated using a named vector for ploidy.

Value

Variable of class class_geno.


jendelman/StageWise documentation built on Feb. 23, 2025, 11 a.m.