impute_L2H: Impute from low to high density markers by Random Forest

View source: R/impute_L2H.R

impute_L2HR Documentation

Impute from low to high density markers by Random Forest

Description

Impute from low to high density markers by Random Forest

Usage

impute_L2H(
  high.file,
  low.file,
  out.file = NULL,
  params = list(),
  exclude = NULL,
  n.core = 1
)

Arguments

high.file

name of high density file

low.file

name of low density file

out.file

name of CSV output file for imputed data

params

list of parameters (see Details)

exclude

optional, vector of high density samples to exclude

n.core

multicore processing

Details

Argument params is a list with the following options: format, model, n.tree, n.mark. format can have values "GT" (integer dosage) or "DS" (real numbers between 0 and ploidy). model can be "class" for classification or "regress" for regression when "GT" is used; for "DS" format, only regression is permitted. n.tree is the number of trees (default = 100). n.mark is the number of markers to use as predictors (default = 100), chosen based on minimum distance to the target.

The exclude argument is useful for cross-validation.

Both VCF and CSV are allowable input file formats–they are recognized based on the file extension. For CSV, the first three columns should be marker, chrom, pos. The output file is CSV.

Any missing data are imputed separately for each input file at the outset, using the population mean (regress) or mode (class) for each marker.

Value

matrix of OOB error with dimensions markers x trees. For regression model, it is MSE.


jendelman/polyBreedR documentation built on Jan. 5, 2025, 12:13 a.m.