het_reencode_bed: Reencode a Plink BED file to (twice) heterozygote indicators

View source: R/het_reencode_bed.R

het_reencode_bedR Documentation

Reencode a Plink BED file to (twice) heterozygote indicators

Description

Given an existing plink-formatted BED (binary) file, this function reads it, transforms genotypes on the go, and writes a new BED file such that heterozygotes are encoded as 2 and homozygotes as 0. In other words, it transforms the numerical genotype values c( 0, 1, 2, NA ) into c( 0, 2, 0, NA ). Heterozygotes are encoded as 2, rather than 1, so existing code for calculating allele frequencies and related quantities, such as kinship estimates, works on this data as intended. Intended to transform extremely large files that should not be loaded entirely into memory at once.

Usage

het_reencode_bed(
  file_in,
  file_out,
  m_loci = NA,
  n_ind = NA,
  make_bim_fam = TRUE,
  verbose = TRUE
)

Arguments

file_in

Input file path. *.bed extension may be omitted (will be added automatically if file doesn't exist but file.bed does).

file_out

Output file path. *.bed extension may be omitted (will be added automatically if it is missing).

m_loci

Number of loci in the input genotype table. If NA, it is deduced from the paired *.bim file

n_ind

Number of individuals in the input genotype table. If NA, it is deduced from the paired *.fam file

make_bim_fam

If TRUE, create symbolic links (using symlink()) for the output file's *.bim and *.fam that link to the corresponding input files. Otherwise only the *.bed file is created.

verbose

If TRUE (default) function reports the path of the files being read and written to (after autocompleting the extension).

See Also

read_bed() and write_bed(), from which much of the code of this function is derived, which explains additional BED format requirements.

Examples

# define input and output, both of which will also work without extension
# read an existing Plink *.bed file
file_in <- system.file("extdata", 'sample.bed', package = "genio", mustWork = TRUE)
# write to a *temporary* location for this example
file_out <- tempfile('delete-me-example')

# in default mode, deduces dimensions from paired *.bim and *.fam tables
het_reencode_bed( file_in, file_out )

# delete output when done
delete_files_plink( file_out )


OchoaLab/genio documentation built on Feb. 22, 2025, 4:13 a.m.