impute_L2H: Impute from low to high density markers by Random Forest
In jendelman/polyBreedR: Genomics-assisted breeding for polyploids (and diploids)

impute_L2H

R Documentation

Impute from low to high density markers by Random Forest

Description

Impute from low to high density markers by Random Forest

Usage

impute_L2H(
  high.file,
  low.file,
  out.file = NULL,
  params = list(),
  exclude = NULL,
  n.core = 1
)

Arguments

`high.file`	name of high density file
`low.file`	name of low density file
`out.file`	name of CSV output file for imputed data
`params`	list of parameters (see Details)
`exclude`	optional, vector of high density samples to exclude
`n.core`	multicore processing

Details

Argument params is a list with the following options: format, model, n.tree, n.mark. format can have values "GT" (integer dosage) or "DS" (real numbers between 0 and ploidy). model can be "class" for classification or "regress" for regression when "GT" is used; for "DS" format, only regression is permitted. n.tree is the number of trees (default = 100). n.mark is the number of markers to use as predictors (default = 100), chosen based on minimum distance to the target.

The exclude argument is useful for cross-validation.

Both VCF and CSV are allowable input file formats–they are recognized based on the file extension. For CSV, the first three columns should be marker, chrom, pos. The output file is CSV.

Any missing data are imputed separately for each input file at the outset, using the population mean (regress) or mode (class) for each marker.