variantSites: Identify Variant Sites in Genotype Files

View source: R/variantSites.r

variantSitesR Documentation

Identify Variant Sites in Genotype Files

Description

This function processes genotype data from multiple files to identify variant markers based on homozygosity thresholds for a set of individuals. It supports parallel processing to improve performance when handling large datasets.

Usage

variantSites(
  files,
  filename = "variantSites.txt",
  ChosenInds = "all",
  requireHomozygous = TRUE,
  nCores = 1
)

Arguments

files

A character vector with paths to files with genotypes.

filename

A character vector with a path where to save the converted genotypes.

ChosenInds

A numeric or logical vector of indices of individuals to be included in the analysis.

requireHomozygous

A logical or numeric vector indicating whether to require the site to have at least one or more homozygous individual(s) for each allele.

nCores

A numeric number of cores to be used for parallelisation. Must be nCores = 1 on Windows.

Details

The results are written to a specified output file and also returned as a logical vector.

A marker is considered a variant if at least requireHomozygous individuals are homozygous for each of the two alleles encoded in the diem-formatted input files.

Parallel processing when nCores > 1 is available only for non-Windows operation Windows computers must use nCores = 1. systems.

Value

A logical vector indicating whether each marker in the dataset is a variant site (TRUE) or not (FALSE). The same results are also written to the specified output file.

Examples

# Run this example in a folder with write permission
files <- c(
  system.file("extdata", "data7x3.txt", package = "diemr"),
  system.file("extdata", "data7x10.txt", package = "diemr")
)
## Not run: 

variant1 <- variantSites(files, filename = "v1.txt")
variant2 <- variantSites(files, filename = "v2.txt", requireHomozygous = 2)

## End(Not run)


diemr documentation built on Dec. 11, 2025, 5:07 p.m.