GenoPop_Impute: GenoPop-Impute
In GenoPop: Genotype Imputation and Population Genomics Efficiently from Variant Call Formatted (VCF) Files

GenoPop_Impute

R Documentation

GenoPop-Impute

Description

Performs imputation of missing genomic data in batches using the missForest (Stekhoven & Bühlmanm, 2012) algorithm. This function reads VCF files, divides it into batches of a fixed number of SNPs, applies the missForest algorithm to each batch, and writes the results to a new VCF file, which will be returned bgzipped and tabix indexed. The choice of the batch size is critical for balancing accuracy and computational demand. We found that a batch size of 500 SNPs is the most accurate for recombination rates typical of mammalians. For on average higher recombination rates (> 5 cM/Mb) we recommend a batch size of 100 SNPs.

Usage

GenoPop_Impute(
  vcf_path,
  output_vcf,
  batch_size = 1000,
  maxiter = 10,
  ntree = 100,
  threads = 1,
  write_log = FALSE,
  logfile = "log.txt"
)

Arguments

`vcf_path`	Path to the input VCF file.
`output_vcf`	Path for the output VCF file with imputed data.
`batch_size`	Number of SNPs to process per batch (default: 500).
`maxiter`	Number of improvement iterations for the random forest algorithm (default: 10).
`ntree`	Number of decision trees in the random forest (default: 100).
`threads`	Number of threads used for computation (default: 1).
`write_log`	If TRUE, writes a log file of the process (advised for large datasets).
`logfile`	Path to the log file, used if `write_log` is TRUE.

Value

Path to the output VCF file with imputed data.

Examples

 vcf_file <- system.file("tests/testthat/sim_miss.vcf.gz", package = "GenoPop")
 index_file <- system.file("tests/testthat/sim_miss.vcf.gz.tbi", package = "GenoPop")
 output_file <- tempfile(fileext = ".vcf")
 GenoPop_Impute(vcf_file, output_vcf = output_file, batch_size = 500)

GenoPop documentation built on April 3, 2025, 9:51 p.m.

GenoPop index

README.md GenoPop

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

GenoPop
Genotype Imputation and Population Genomics Efficiently from Variant Call Formatted (VCF) Files

GenoPop_Impute: GenoPop-Impute
In GenoPop: Genotype Imputation and Population Genomics Efficiently from Variant Call Formatted (VCF) Files

GenoPop-Impute

Description

Usage

Arguments

Value

Examples

Related to GenoPop_Impute in GenoPop...

R Package Documentation

Browse R Packages

We want your feedback!

GenoPop Genotype Imputation and Population Genomics Efficiently from Variant Call Formatted (VCF) Files

GenoPop_Impute: GenoPop-Impute In GenoPop: Genotype Imputation and Population Genomics Efficiently from Variant Call Formatted (VCF) Files

GenoPop-Impute

Description

Usage

Arguments

Value

Examples

Related to GenoPop_Impute in GenoPop...

R Package Documentation

Browse R Packages

We want your feedback!

GenoPop
Genotype Imputation and Population Genomics Efficiently from Variant Call Formatted (VCF) Files

GenoPop_Impute: GenoPop-Impute
In GenoPop: Genotype Imputation and Population Genomics Efficiently from Variant Call Formatted (VCF) Files