duplicateDiscordance: Duplicate discordance
In smgogarten/GWASTools: Tools for Genome Wide Association Studies

duplicateDiscordance

R Documentation

Duplicate discordance

Description

A function to compute pair-wise genotype discordances between multiple genotyping instances of the same subject.

Usage

duplicateDiscordance(genoData, subjName.col,
                     one.pair.per.subj=TRUE, corr.by.snp=FALSE,
                     minor.allele.only=FALSE, allele.freq=NULL,
                     scan.exclude=NULL, snp.exclude=NULL,
                     snp.block.size=5000, verbose=TRUE)

Arguments

`genoData`	`GenotypeData` object
`subjName.col`	A character string indicating the name of the annotation variable that will be identical for duplicate scans.
`one.pair.per.subj`	A logical indicating whether a single pair of scans should be randomly selected for each subject with more than 2 scans.
`corr.by.snp`	A logical indicating whether correlation by SNP should be computed (may significantly increase run time).
`minor.allele.only`	A logical indicating whether discordance should be calculated only between pairs of scans in which at least one scan has a genotype with the minor allele (i.e., exclude major allele homozygotes).
`allele.freq`	A numeric vector with the frequency of the A allele for each SNP in `genoData`. Required if `minor.allele.only=TRUE`.
`scan.exclude`	An integer vector containing the ids of scans to be excluded.
`snp.exclude`	An integer vector containing the ids of SNPs to be excluded.
`snp.block.size`	Integer block size for SNPs if `corr.by.snp=TRUE`.
`verbose`	Logical value specifying whether to show progress information.

Details

duplicateDiscordance calculates discordance metrics both by scan and by SNP. If one.pair.per.subj=TRUE (the default), each subject with more than two duplicate genotyping instances will have two scans randomly selected for computing discordance. If one.pair.per.subj=FALSE, discordances will be calculated pair-wise for all possible pairs for each subject.

Value

A list with the following components:

`discordance.by.snp`	data frame with 5 columns: 1. snpID, 2. discordant (number of discordant pairs), 3. npair (number of pairs examined), 4. n.disc.subj (number of subjects with at least one discordance), 5. discord.rate (discordance rate i.e. discordant/npair)
`discordance.by.subject`	a list of matrices (one for each subject) with the pair-wise discordance between the different genotyping instances of the subject
`correlation.by.subject`	a list of matrices (one for each subject) with the pair-wise correlation between the different genotyping instances of thesubject

If corr.by.snp=TRUE, discordance.by.snp will also have a column "correlation" with the correlation between duplicate subjects. For this calculation, the first two samples per subject are selected.

Author(s)

Tushar Bhangale, Cathy Laurie, Stephanie Gogarten, Sarah Nelson

Examples

library(GWASdata)
file <- system.file("extdata", "illumina_geno.gds", package="GWASdata")
gds <- GdsGenotypeReader(file)
data(illuminaScanADF)
genoData <-  GenotypeData(gds, scanAnnot=illuminaScanADF)

disc <- duplicateDiscordance(genoData, subjName.col="subjectID")

# minor allele discordance
afreq <- alleleFrequency(genoData)
minor.disc <- duplicateDiscordance(genoData, subjName.col="subjectID",
  minor.allele.only=TRUE, allele.freq=afreq[,"all"])

close(genoData)

smgogarten/GWASTools documentation built on June 10, 2025, 3:53 a.m.