Description Usage Arguments Details Value Note Author(s) References See Also Examples
View source: R/Index_calculations.r
Calculate the Index of Association and Standardized Index of Association.
ia()
calculates the index of association over all loci in
the data set.
pair.ia()
calculates the index of association in a pairwise
manner among all loci.
resample.ia()
calculates the index of association on a
reduced data set multiple times to create a distribution, showing the
variation of values observed at a given sample size (previously
jack.ia
).
1 2 3 4 5 6 7 8 9 
gid 
a 
sample 
an integer indicating the number of permutations desired (eg 999). 
method 
an integer from 1 to 4 indicating the sampling method desired.
see 
quiet 
Should the function print anything to the screen while it is performing calculations?

missing 
a character string. see 
plot 
When 
hist 

index 

valuereturn 

low 
(for pair.ia) a color to use for low values when 
high 
(for pair.ia) a color to use for low values when 
limits 
(for pair.ia) the limits to be used for the color scale.
Defaults to 
n 
an integer specifying the number of samples to be drawn. Defaults to

reps 
an integer specifying the number of replicates to perform. Defaults to 999. 
... 
arguments to be passed on to resample.ia 
The index of association was originally developed by A.H.D. Brown analyzing population structure of wild barley (Brown, 1980). It has been widely used as a tool to detect clonal reproduction within populations . Populations whose members are undergoing sexual reproduction, whether it be selfing or outcrossing, will produce gametes via meiosis, and thus have a chance to shuffle alleles in the next generation. Populations whose members are undergoing clonal reproduction, however, generally do so via mitosis. This means that the most likely mechanism for a change in genotype is via mutation. The rate of mutation varies from species to species, but it is rarely sufficiently high to approximate a random shuffling of alleles. The index of association is a calculation based on the ratio of the variance of the raw number of differences between individuals and the sum of those variances over each locus . You can also think of it as the observed variance over the expected variance. If they are the same, then the index is zero after subtracting one (from MaynardSmith, 1993):
Ia = (Vo/Ve)  1
Since the distance is more or less a binary distance, any sort of marker can be used for this analysis. In the calculation, phase is not considered, and any difference increases the distance between two individuals. Remember that each column represents a different allele and that each entry in the table represents the fraction of the genotype made up by that allele at that locus. Notice also that the sum of the rows all equal one. Poppr uses this to calculate distances by simply taking the sum of the absolute values of the differences between rows.
The calculation for the distance between two individuals at a single locus with a allelic states and a ploidy of k is as follows (except for Presence/Absence data):
d(A,B) = (k/2)*sum(abs(Ai  Bi))
To find the total number of differences between two individuals over all loci, you just take d over m loci, a value we'll call D:
D = sum(di)
These values are calculated over all possible combinations of individuals in the data set, choose(n, 2) after which you end up with choose(n, 2) * m values of d and choose(n, 2) values of D. Calculating the observed variances is fairly straightforward (modified from Agapow and Burt, 2001):
Vo = var(D)
Calculating the expected variance is the sum of each of the variances of the individual loci. The calculation at a single locus, j is the same as the previous equation, substituting values of D for d:
Varj = var(dj)
The expected variance is then the sum of all the variances over all m loci:
Ve = sum(var(dj))
Agapow and Burt showed that Ia increases steadily with the number of loci, so they came up with an approximation that is widely used, rbarD. For the derivation, see the manual for multilocus.
rbarD = (Vo  Ve)/(2*sum(sum(sqrt(var(dj)*var(dk))))
pair.ia
A matrix with two columns and choose(nLoc(gid), 2) rows representing the values for Ia and rbarD per locus pair.
A named number vector of length 2 giving the Index of Association, "Ia"; and the Standardized Index of Association, "rbarD"
A a named number vector of length 4 with the following values:
Ia  numeric. The index of association.
p.Ia  A number indicating the pvalue resulting from a onesided permutation test based on the number of samples indicated in the original call.
rbarD  numeric. The standardized index of association.
p.rD  A factor indicating the pvalue resulting from a onesided permutation test based on the number of samples indicated in the original call.
A list with the following elements:
index The above vector
samples A data frame with s by 2 column data frame where s is the number of samples defined. The columns are for the values of Ia and rbarD, respectively.
a data frame with the index of association and standardized index of association in columns. Number of rows represents the number of reps.
jack.ia()
is deprecated as the name was misleading. Please use
resample.ia()
Zhian N. Kamvar
PaulMichael Agapow and Austin Burt. Indices of multilocus linkage disequilibrium. Molecular Ecology Notes, 1(12):101102, 2001
A.H.D. Brown, M.W. Feldman, and E. Nevo. Multilocus structure of natural populations of Hordeum spontaneum. Genetics, 96(2):523536, 1980.
J M Smith, N H Smith, M O'Rourke, and B G Spratt. How clonal are bacteria? Proceedings of the National Academy of Sciences, 90(10):43844388, 1993.
poppr
, missingno
,
import2genind
, read.genalex
,
clonecorrect
, win.ia
, samp.ia
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66  data(nancycats)
ia(nancycats)
# Pairwise over all loci:
data(partial_clone)
res < pair.ia(partial_clone)
plot(res, low = "black", high = "green", index = "Ia")
# Resampling
data(Pinf)
resample.ia(Pinf, reps = 99)
## Not run:
# Plot the results of resampling rbarD.
library("ggplot2")
Pinf.resamp < resample.ia(Pinf, reps = 999)
ggplot(Pinf.resamp[2], aes(x = rbarD)) +
geom_histogram() +
geom_vline(xintercept = ia(Pinf)[2]) +
geom_vline(xintercept = ia(clonecorrect(Pinf))[2], linetype = 2) +
xlab(expression(bar(r)[d]))
# Get the indices back and plot the distributions.
nansamp < ia(nancycats, sample = 999, valuereturn = TRUE)
plot(nansamp, index = "Ia")
plot(nansamp, index = "rbarD")
# You can also adjust the parameters for how large to display the text
# so that it's easier to export it for publication/presentations.
library("ggplot2")
plot(nansamp, labsize = 5, linesize = 2) +
theme_bw() + # adding a theme
theme(text = element_text(size = rel(5))) + # changing text size
theme(plot.title = element_text(size = rel(4))) + # changing title size
ggtitle("Index of Association of nancycats") # adding a new title
# Get the index for each population.
lapply(seppop(nancycats), ia)
# With sampling
lapply(seppop(nancycats), ia, sample = 999)
# Plot pairwise ia for all populations in a grid with cowplot
# Set up the library and data
library("cowplot")
data(monpop)
splitStrata(monpop) < ~Tree/Year/Symptom
setPop(monpop) < ~Tree
# Need to set up a list in which to store the plots.
plotlist < vector(mode = "list", length = nPop(monpop))
names(plotlist) < popNames(monpop)
# Loop throgh the populations, calculate pairwise ia, plot, and then
# capture the plot in the list
for (i in popNames(monpop)){
x < pair.ia(monpop[pop = i], limits = c(0.15, 1)) # subset, calculate, and plot
plotlist[[i]] < ggplot2::last_plot() # save the last plot
}
# Use the plot_grid function to plot.
plot_grid(plotlist = plotlist, labels = paste("Tree", popNames(monpop)))
## End(Not run)

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.