DuplicateCheck: Check Data for Duplicates.

View source: R/DuplicateCheck.R

DuplicateCheckR Documentation

Check Data for Duplicates.

Description

Check the genotype and life history data for duplicate IDs (not permitted) and duplicated genotypes (not advised), and count how many individuals in the genotype data are not included in the life history data (permitted). The order of IDs in the genotype and life history data is not required to be identical.

Usage

DuplicateCheck(GenoM = NULL, FortPARAM.dup, quiet)

Arguments

GenoM

matrix with genotype data, size nInd x nSnp.

FortPARAM.dup

list with Fortran-ready parameter values, as generated by MkFortParams.

quiet

suppress messages.

Value

A list with one or more of the following elements:

DupGenoID

Dataframe, row numbers of duplicated IDs in genotype data. Please do remove or relabel these to avoid downstream confusion.

DupGenotype

Dataframe, duplicated genotypes (with or without identical IDs). The specified number of maximum mismatches is allowed, and this dataframe may include pairs of closely related individuals. Mismatch = number of SNPs at which genotypes differ, LLR = likelihood ratio between 'self' and most likely non-self.

See Also

CheckLH, which performs the check for duplicated IDs in the life history data, as well as for IDs (in genotype data) for which no life history data is provided.


sequoia documentation built on Sept. 8, 2023, 5:29 p.m.