duplicateDiscordance: Duplicate discordance

View source: R/duplicateDiscordance.R

duplicateDiscordanceR Documentation

Duplicate discordance

Description

A function to compute pair-wise genotype discordances between multiple genotyping instances of the same subject.

Usage

duplicateDiscordance(genoData, subjName.col,
                     one.pair.per.subj=TRUE, corr.by.snp=FALSE,
                     minor.allele.only=FALSE, allele.freq=NULL,
                     scan.exclude=NULL, snp.exclude=NULL,
                     snp.block.size=5000, verbose=TRUE)

Arguments

genoData

GenotypeData object

subjName.col

A character string indicating the name of the annotation variable that will be identical for duplicate scans.

one.pair.per.subj

A logical indicating whether a single pair of scans should be randomly selected for each subject with more than 2 scans.

corr.by.snp

A logical indicating whether correlation by SNP should be computed (may significantly increase run time).

minor.allele.only

A logical indicating whether discordance should be calculated only between pairs of scans in which at least one scan has a genotype with the minor allele (i.e., exclude major allele homozygotes).

allele.freq

A numeric vector with the frequency of the A allele for each SNP in genoData. Required if minor.allele.only=TRUE.

scan.exclude

An integer vector containing the ids of scans to be excluded.

snp.exclude

An integer vector containing the ids of SNPs to be excluded.

snp.block.size

Integer block size for SNPs if corr.by.snp=TRUE.

verbose

Logical value specifying whether to show progress information.

Details

duplicateDiscordance calculates discordance metrics both by scan and by SNP. If one.pair.per.subj=TRUE (the default), each subject with more than two duplicate genotyping instances will have two scans randomly selected for computing discordance. If one.pair.per.subj=FALSE, discordances will be calculated pair-wise for all possible pairs for each subject.

Value

A list with the following components:

discordance.by.snp

data frame with 5 columns: 1. snpID, 2. discordant (number of discordant pairs), 3. npair (number of pairs examined), 4. n.disc.subj (number of subjects with at least one discordance), 5. discord.rate (discordance rate i.e. discordant/npair)

discordance.by.subject

a list of matrices (one for each subject) with the pair-wise discordance between the different genotyping instances of the subject

correlation.by.subject

a list of matrices (one for each subject) with the pair-wise correlation between the different genotyping instances of thesubject

If corr.by.snp=TRUE, discordance.by.snp will also have a column "correlation" with the correlation between duplicate subjects. For this calculation, the first two samples per subject are selected.

Author(s)

Tushar Bhangale, Cathy Laurie, Stephanie Gogarten, Sarah Nelson

See Also

GenotypeData, duplicateDiscordanceAcrossDatasets, duplicateDiscordanceProbability, alleleFrequency

Examples

library(GWASdata)
file <- system.file("extdata", "illumina_geno.gds", package="GWASdata")
gds <- GdsGenotypeReader(file)
data(illuminaScanADF)
genoData <-  GenotypeData(gds, scanAnnot=illuminaScanADF)

disc <- duplicateDiscordance(genoData, subjName.col="subjectID")

# minor allele discordance
afreq <- alleleFrequency(genoData)
minor.disc <- duplicateDiscordance(genoData, subjName.col="subjectID",
  minor.allele.only=TRUE, allele.freq=afreq[,"all"])

close(genoData)

smgogarten/GWASTools documentation built on May 18, 2024, 1:19 a.m.