dups: Find duplicate samples

View source: R/snp-match.R

dupsR Documentation

Find duplicate samples

Description

Find indices of possible sample duplications between two aSnpStats objects

Usage

dups(x, y = NULL, tol = ncol(x)/50, type = c("hethom", "all"),
  stopatone = TRUE)

Arguments

x

aSnpStats object

y

aSnpStats object

tol

maximum number of mismatched genotypes allowed for duplicate samples

type

by default, dups compares only homs vs hets, to allow for differently labelled alleles. Set type="all" to allow the two kinds of homozygote genotypes to count as a mismatch

stopatone

if TRUE, assume each sample in x can have at most one match in y, and vice versa. This makes things faster, and should be safe assuming x and y themselves contain no internal duplicates so is set to TRUE by default, but set it to FALSE if you want to catch multiple matches.

Details

Each pair of samples from x and y are compared in turn. If the number of mismatched and non-missing genotypes exceeds tol, the pair are assumed to be non-duplicates, and counting proceeds to the next pair. If the total number of mismatched and non-missing genotypes is <tol, then the indices of the sample pair are stored, and returned together with the number of mismatches and the number of non-missing genotypes compared.

Value

a matrix, with four columns: index of dup in x, index of dup in y, number of mismatches, number of comparisons

Author(s)

Chris Wallace

Examples

## example data where samples 6:10 in x are the same as 1:5 in y
x <- example.data(1:10,1:500)
y <- example.data(6:15,1:500)
dups(x,y)

chr1swallace/annotSnpStats documentation built on April 18, 2023, 11:22 a.m.