rescale_popkin: Rescale kinship matrix to set a given kinship value to zero.

View source: R/rescale_popkin.R

rescale_popkinR Documentation

Rescale kinship matrix to set a given kinship value to zero.


If you already have a population kinship matrix, and you desire to estimate the kinship matrix in a subset of the individuals, you could do it the slow way (reestimating starting from the genotypes of the subset of individuals) or you can do it the fast way: first subset the kinship matrix to only contain the individuals of interest, then use this function to rescale this kinship matrix so that the minimum kinship is zero. This rescaling is required when subsetting results in a more recent Most Recent Common Ancestor (MRCA) population compared to the original dataset (for example, if the original data had individuals from across the world but the subset only contains individuals from a single continent).


rescale_popkin(kinship, subpops = NULL, min_kinship = NA)



An n-by-n kinship matrix.


The length-n vector of subpopulation assignments for each individual.


A scalar kinship value to define the new zero kinship.


This function rescales the input kinship matrix so that the value min_kinship in the original kinship matrix becomes zero, using the formula kinship_rescaled = ( kinship - min_kinship ) / ( 1 - min_kinship ). This is equivalent to changing the ancestral population of the data. If subpopulation labels subpops are provided (recommended), they are used to estimate min_kinship using the function popkin_A_min_subpops(), which is the recommended way to set the MRCA population correctly. If both subpops and min_kinship are provided, only min_kinship is used. If both subpops and min_kinship are omitted, the function sets min_kinship = min( kinship ).


The rescaled n-by-n kinship matrix, with the desired level of relatedness set to zero.


# Construct toy data
X <- matrix(c(0,1,2,1,0,1,1,0,2), nrow=3, byrow=TRUE) # genotype matrix
subpops <- c(1,1,2) # subpopulation assignments for individuals
subpops2 <- 1:3 # alternate labels treat every individual as a different subpop

# NOTE: for BED-formatted input, use BEDMatrix!
# "file" is path to BED file (excluding .bed extension)
## library(BEDMatrix)
## X <- BEDMatrix(file) # load genotype matrix object

# suppose we first estimate kinship without subpopulations, which will be more biased
kinship <- popkin(X) # calculate kinship from genotypes, WITHOUT subpops
# then we visualize this matrix, figure out a reasonable subpopulation partition

# now we can adjust the kinship matrix!
kinship2 <- rescale_popkin(kinship, subpops)
# prev is faster but otherwise equivalent to re-estimating kinship from scratch with subpops:
# kinship2 <- popkin(X, subpops) 

# can also manually set the level of relatedness min_kinship we want to be zero:
min_kinship <- min(kinship) # a naive choice for example
kinship2 <- rescale_popkin(kinship, min_kinship = min_kinship)

# lastly, omiting both subpops and min_kinship sets the minimum value in kinship to zero
kinship3 <- rescale_popkin(kinship2)
# equivalent to both of:
# kinship3 <- popkin(X)
# kinship3 <- rescale_popkin(kinship2, min_kinship = min(kinship))

popkin documentation built on Jan. 7, 2023, 1:26 a.m.