allele_freqs: Compute locus allele frequencies

View source: R/allele_freqs.R

allele_freqsR Documentation

Compute locus allele frequencies

Description

On a regular matrix, this is essentially a wrapper for colMeans() or rowMeans() depending on loci_on_cols. On a BEDMatrix object, the locus allele frequencies are computed keeping memory usage low.

Usage

allele_freqs(
  X,
  loci_on_cols = FALSE,
  fold = FALSE,
  m_chunk_max = 1000,
  subset_ind = NULL,
  want_counts = FALSE
)

Arguments

X

The genotype matrix (regular R matrix or BEDMatrix object). Missing values are ignored in averages.

loci_on_cols

If TRUE, X has loci on columns and individuals on rows; if false (the default), loci are on rows and individuals on columns. If X is a BEDMatrix object, code assumes loci on columns (loci_on_cols is ignored).

fold

If TRUE, allele frequencies are converted to minor allele frequencies. Default is to return frequencies for the given allele counts in X (regardless of whether it is the minor or major allele).

m_chunk_max

BEDMatrix-specific, sets the maximum number of loci to process at the time. If memory usage is excessive, set to a lower value than default (expected only for extremely large numbers of individuals).

subset_ind

Optionally subset individuals by providing their indexes (negative indexes to exclude) or a boolean vector (in other words, the usual ways to subset matrices). Most useful for BEDMatrix inputs, to subset chunks and retain low memory usage.

want_counts

If TRUE (default FALSE), raw allele counts are also returned. Note fold option has no effect on these counts.

Value

If want_counts = FALSE, the vector of estimated ancestral allele frequencies, one per locus. Names are set to the locus names, if present. If want_counts = TRUE, a named list containing both the estimated ancestral allele frequencies p_anc_est and the allele counts matrix, with loci along the rows (also with names if present), and alleles along the columns.

Examples

# Construct toy data
X <- matrix(
    c(0, 1, 2,
      1, 0, 1,
      1, NA, 2),
    nrow = 3,
    byrow = TRUE
)

# row means
allele_freqs(X)
c(1/2, 1/3, 3/4)

# row means, in minor allele frequencies
allele_freqs(X, fold = TRUE)
c(1/2, 1/3, 1/4)

# col means
allele_freqs(X, loci_on_cols = TRUE)
c(1/3, 1/4, 5/6)


OchoaLab/simtrait documentation built on July 4, 2025, 3:48 a.m.