DPAplot: Create DPA plots.

Description Usage Arguments Value Details Examples

View source: R/DPAplot.R

Description

A function to generate Disequilibrium Patern Analysis (DPA) plots for haplotype frequency data.

A function to generate Disequilibrium Patern Analysis (DPA) plots for haplotype frequency data.

Usage

1
2
3
4
5
DPAplot(dat, y.threshold = 0.005, r2.threshold = 0.75,
  tolerance = 0.01)

DPAplot(dat, y.threshold = 0.005, r2.threshold = 0.75,
  tolerance = 0.01)

Arguments

dat

A data.frame with 5 required variables (having the names listed below):

haplo.freq A numeric vector of haplotype frequencies.
locus1 A character vector indentifying the first locus.
locus2 A character vector indentifying the second locus.
allele1 A character vector indentifying the allele at locus 1.
allele2 A character vector indentifying the allele at locus 2.
y.threshold

A threshold for plotting based on the maximum expected freq. If the maximum expected freq is less than y.threshold, no plot is created (default=0.005)

r2.threshold

A threshold for plotting based on the fit of the regression line. If the R-squared value is less than r2.threshold, no plot is created (default=0.75)

tolerance

A threshold for the sum of the haplotype frequencies. If the sum of the haplotype frequencies is greater than 1+tolerance or less than 1-tolerance an error is returned. The default is 0.01.

dat

A data.frame with 5 required variables (having the names listed below):

haplo.freq A numeric vector of haplotype frequencies.
locus1 A character vector indentifying the first locus.
locus2 A character vector indentifying the second locus.
allele1 A character vector indentifying the allele at locus 1.
allele2 A character vector indentifying the allele at locus 2.
tolerance

A threshold for the sum of the haplotype frequencies. If the sum of the haplotype frequencies is greater than 1+tolerance or less than 1-tolerance an error is returned. The default is 0.01.

y.threshold

A threshold for plotting based on the maximum expected freq. If the maximum expected freq is less than y.threshold, no plot is created (default=0.005)

r2.threshold

A threshold for plotting based on the fit of the regression line. If the R-squared value is less than r2.threshold, no plot is created (default=0.75)

Value

A series of plots are created. The return value is a dataframe with the following components:

focal the focal allele at the 1st locus.
select the potentially selected allele at the 2nd locus.
r2.lt0 the R^2 value in the negative D-space.
maxdij the maximum d_ij value.
exp.frq.max.d the expected freq corresponding to the value with maxdij.
prop.gt0 the proportion of points with d_ij > 0.
n.gt.halfmax.d the # of points with d_ij > .5*maxdij.
fold.inc the fold increase in frequency for the potentially selected haplotype: (hapfreq - expfreq)/expfreq.

A series of plots are created. The return value is a dataframe with the following components:

focal the focal allele at the 1st locus.
select the potentially selected allele at the 2nd locus.
r2.lt0 the R^2 value in the negative D-space.
maxdij the maximum d_ij value.
exp.frq.max.d the expected freq corresponding to the value with maxdij.
prop.gt0 the proportion of points with d_ij > 0.
n.gt.halfmax.d the # of points with d_ij > .5*maxdij.
fold.inc the fold increase in frequency for the potentially selected haplotype: (hapfreq - expfreq)/expfreq.

Details

A warning message is given if the sum of the haplotype frequencies is greater than 1.01 or less than 0.99 (regardless of the tolerance setting). The haplotype frequencies that are passed to the function are normalized within the function to sum to 1.0 by dividing each frequency by the sum of the passed frequencies.

A warning message is given if the sum of the haplotype frequencies is greater than 1.01 or less than 0.99 (regardless of the tolerance setting). The haplotype frequencies that are passed to the function are normalized within the function to sum to 1.0 by dividing each frequency by the sum of the passed frequencies.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
library(LDtools)

# An example using the Northern Ireland data from Williams et al.(2004)
data(NIreland.freqs)
ni.dat <- NIreland.freqs
loc1 <- "A"
loc2 <- "B"
temp.dat <- ni.dat[ni.dat$locus1==loc1 & ni.dat$locus2==loc2,]
DPAplot(dat=temp.dat, y.threshold=.005, r2.threshold=.70)
#Create a file with several DPA plots for the chosen loci
postscript(file="Irish_A-B.ps", horizontal=TRUE)
par(mfrow=c(2,2))
DPAplot(dat=temp.dat, y.threshold=.005, r2.threshold=.70)
dev.off()

#' # An example using haplotype frequencies from Wilson(2010)
require(asymLD)
data(hla.freqs)
hla.a_b <- hla.freqs[hla.freqs$locus1=="A" & hla.freqs$locus2=="B",]
compute.ALD(hla.a_b)
hla.freqs$locus <- paste(hla.freqs$locus1, hla.freqs$locus2, sep="-")
compute.ALD(hla.freqs[hla.freqs$locus=="C-B",])
# Note: additonal columns on the input dataframe (e.g., "locus" above) are allowed, but 
# ignored by the function.

# An example using genotype data from the haplo.stats package
require(haplo.stats)
data(hla.demo)
geno <- hla.demo[,5:8]  #DPB-DPA 
label <- unique(gsub(".a(1|2)", "", colnames(geno)))
label <- paste("HLA*",label,sep="")
keep <- !apply(is.na(geno) | geno==0, 1, any)
em.keep  <- haplo.em(geno=geno[keep,], locus.label=label)
hapfreqs.df <- cbind(em.keep$haplotype, em.keep$hap.prob) 
#format dataframe for ALD function
names(hapfreqs.df)[dim(hapfreqs.df)[2]] <- "haplo.freq"
names(hapfreqs.df)[1] <- "allele1"
names(hapfreqs.df)[2] <- "allele2"
hapfreqs.df$locus1 <- label[1]
hapfreqs.df$locus2 <- label[2]
head(hapfreqs.df)
compute.ALD(hapfreqs.df)
# Note that there is substantially less variablity (higher ALD) for HLA*DPA1 
# conditional on HLA*DPB1 than for HLA*DPB1 conditional on HLA*DPA1, indicating 
# that the overall variation for DPA1 is relatively low given specific DPB1 alleles

library(LDtools)

# An example using the Northern Ireland data from Williams et al.(2004)
data(NIreland.freqs)
ni.dat <- NIreland.freqs
loc1 <- "A"
loc2 <- "B"
temp.dat <- ni.dat[ni.dat$locus1==loc1 & ni.dat$locus2==loc2,]
DPAplot(dat=temp.dat, y.threshold=.005, r2.threshold=.70)
#Create a file with several DPA plots for the chosen loci
postscript(file="Irish_A-B.ps", horizontal=TRUE)
par(mfrow=c(2,2))
DPAplot(dat=temp.dat, y.threshold=.005, r2.threshold=.70)
dev.off()

# An example using genotype data from the haplo.stats package
require(haplo.stats)
data(hla.demo)
geno <- hla.demo[,5:8]  #DPB-DPA 
label <- unique(gsub(".a(1|2)", "", colnames(geno)))
label <- paste("HLA*",label,sep="")
keep <- !apply(is.na(geno) | geno==0, 1, any)
em.keep  <- haplo.em(geno=geno[keep,], locus.label=label)
hapfreqs.df <- cbind(em.keep$haplotype, em.keep$hap.prob) 
#format dataframe for ALD function
names(hapfreqs.df)[dim(hapfreqs.df)[2]] <- "haplo.freq"
names(hapfreqs.df)[1] <- "allele1"
names(hapfreqs.df)[2] <- "allele2"
hapfreqs.df$locus1 <- label[1]
hapfreqs.df$locus2 <- label[2]
head(hapfreqs.df)
compute.ALD(hapfreqs.df)
# Note that there is substantially less variablity (higher ALD) for HLA*DPA1 
# conditional on HLA*DPB1 than for HLA*DPB1 conditional on HLA*DPA1, indicating 
# that the overall variation for DPA1 is relatively low given specific DPB1 alleles

rsingle/LDtools documentation built on May 28, 2019, 3:32 a.m.