kinship_std: Standard kinship estimator
In OchoaLab/popkinsuppl: Supplement to popkin package

View source: R/kinship_std.R

kinship_std

R Documentation

Standard kinship estimator

Description

This function constructs the standard kinship estimates for a given genotype matrix. Handles very large data, passed as BEDMatrix or as a regular R matrix. Handles missing values correctly.

Usage

kinship_std(
  X,
  n = NA,
  mean_of_ratios = FALSE,
  loci_on_cols = FALSE,
  mem_factor = 0.7,
  mem_lim = NA,
  m_chunk_max = 1000,
  want_M = FALSE
)

Arguments

`X`	The genotype matrix (BEDMatrix, regular R matrix, or function, same as `popkin`).
`n`	The number of individuals. Required if `X` is a function, ignored otherwise.
`mean_of_ratios`	The estimator can be computed in two broad forms. If `FALSE` (default) the ratio-of-means (ROM) version is computed, which behaves more favorably and has a known asymptotic bias. If `TRUE`, the mean-of-ratios (MOR) version is computed, which is more variable and has an uncharacterized bias, but is most common in the literature.
`loci_on_cols`	Determines the orientation of the genotype matrix (by default, `FALSE`, loci are along the rows). If `X` is a BEDMatrix object, the input value is ignored (set automatically to `TRUE` internally).
`mem_factor`	Proportion of available memory to use loading and processing genotypes. Ignored if `mem_lim` is not `NA`.
`mem_lim`	Memory limit in GB, used to break up genotype data into chunks for very large datasets. Note memory usage is somewhat underestimated and is not controlled strictly. Default in Linux and Windows is `mem_factor` times the free system memory, otherwise it is 1GB (OSX and other systems).
`m_chunk_max`	Sets the maximum number of loci to process at the time. Actual number of loci loaded may be lower if memory is limiting.
`want_M`	If `TRUE`, includes the matrix `M` of non-missing pair counts in the return value, which are sample sizes that can be useful in modeling the variance of estimates. Default `FALSE` is to return the kinship matrix only.

Value

If want_M is FALSE, returns the estimated n-by-n kinship matrix only. If X has names for the individuals, they will be copied to the rows and columns of this kinship matrix. If want_M is TRUE, a named list is returned, containing:

kinship: the estimated n-by-n kinship matrix
M: the n-by-n matrix of non-missing pair counts (see want_M option).

Examples

# dimensions of simulated data
n_ind <- 100
m_loci <- 1000
n_data <- n_ind * m_loci

# missingness rate
miss <- 0.1

# simulate ancestral allele frequencies
# uniform (0,1)
# it'll be ok if some of these are zero
p_anc <- runif(m_loci)

# simulate some binomial data
X <- rbinom(n_data, 2, p_anc)

# sprinkle random missingness
X[ sample(X, n_data * miss) ] <- NA

# turn into a matrix
X <- matrix(X, nrow = m_loci, ncol = n_ind)

# estimate kinship matrices
# ... ROM version
kinship_rom <- kinship_std(X)
# ... MOR version
kinship_mor <- kinship_std(X, mean_of_ratios = TRUE)

OchoaLab/popkinsuppl documentation built on May 17, 2022, 9:50 a.m.