kinship_std: Standard kinship estimator

View source: R/kinship_std.R

kinship_stdR Documentation

Standard kinship estimator

Description

This function constructs the standard kinship estimates for a given genotype matrix. Handles very large data, passed as BEDMatrix or as a regular R matrix. Handles missing values correctly.

Usage

kinship_std(
  X,
  n = NA,
  mean_of_ratios = FALSE,
  loci_on_cols = FALSE,
  mem_factor = 0.7,
  mem_lim = NA,
  m_chunk_max = 1000,
  want_M = FALSE
)

Arguments

X

The genotype matrix (BEDMatrix, regular R matrix, or function, same as popkin).

n

The number of individuals. Required if X is a function, ignored otherwise.

mean_of_ratios

The estimator can be computed in two broad forms. If FALSE (default) the ratio-of-means (ROM) version is computed, which behaves more favorably and has a known asymptotic bias. If TRUE, the mean-of-ratios (MOR) version is computed, which is more variable and has an uncharacterized bias, but is most common in the literature.

loci_on_cols

Determines the orientation of the genotype matrix (by default, FALSE, loci are along the rows). If X is a BEDMatrix object, the input value is ignored (set automatically to TRUE internally).

mem_factor

Proportion of available memory to use loading and processing genotypes. Ignored if mem_lim is not NA.

mem_lim

Memory limit in GB, used to break up genotype data into chunks for very large datasets. Note memory usage is somewhat underestimated and is not controlled strictly. Default in Linux and Windows is mem_factor times the free system memory, otherwise it is 1GB (OSX and other systems).

m_chunk_max

Sets the maximum number of loci to process at the time. Actual number of loci loaded may be lower if memory is limiting.

want_M

If TRUE, includes the matrix M of non-missing pair counts in the return value, which are sample sizes that can be useful in modeling the variance of estimates. Default FALSE is to return the kinship matrix only.

Value

If want_M is FALSE, returns the estimated n-by-n kinship matrix only. If X has names for the individuals, they will be copied to the rows and columns of this kinship matrix. If want_M is TRUE, a named list is returned, containing:

  • kinship: the estimated n-by-n kinship matrix

  • M: the n-by-n matrix of non-missing pair counts (see want_M option).

See Also

The popkin package.

Examples

# dimensions of simulated data
n_ind <- 100
m_loci <- 1000
n_data <- n_ind * m_loci

# missingness rate
miss <- 0.1

# simulate ancestral allele frequencies
# uniform (0,1)
# it'll be ok if some of these are zero
p_anc <- runif(m_loci)

# simulate some binomial data
X <- rbinom(n_data, 2, p_anc)

# sprinkle random missingness
X[ sample(X, n_data * miss) ] <- NA

# turn into a matrix
X <- matrix(X, nrow = m_loci, ncol = n_ind)

# estimate kinship matrices
# ... ROM version
kinship_rom <- kinship_std(X)
# ... MOR version
kinship_mor <- kinship_std(X, mean_of_ratios = TRUE)


OchoaLab/popkinsuppl documentation built on May 17, 2022, 9:50 a.m.