pairwise_geno_id: Return every pair of individuals that mismatch at no more...

View source: R/RcppExports.R

pairwise_geno_idR Documentation

Return every pair of individuals that mismatch at no more than max_miss loci

Description

This is used for identifying duplicate individuals/genotypes in large data sets. I've specified this in terms of the max number of missing loci because I think everyone should already have tossed out individuals with a lot of missing data, and then it makes it easy to toss out pairs without even looking at all the loci, so it is faster for all the comparisons.

Usage

pairwise_geno_id(S, max_miss)

Arguments

S

"source", a matrix whose rows are integers, with NumInd-source rows and NumLoci columns, with each entry being a a base-0 representation of the genotype of the c-th locus at the r-th individual. These are the individuals you can think of as parents if there is directionality to the comparisons. Missing data is denoted by -1 (or any integer < 0).

max_miss

maximum allowable number of mismatching genotypes betwen the pairs.

Value

a data frame with columns:

ind1

the base-1 index in S of the first individual of the pair

ind2

the base-1 index in S of the second individual of the pair

num_mismatch

the number of loci at which the pair have mismatching genotypes

num_loc

the total number of loci missing in neither individual


eriqande/CKMRsim documentation built on Aug. 2, 2024, 7:23 a.m.