# Qst - Fst calculations for unbalanced breeding designs
# by Kimberly J. Gilbert (kgilbert@zoology.ubc.ca) and Michael C. Whitlock
# Based on Whitlock and Guillaume (2009)
# Code edited and expanded to function with AFLP data and with additional breeding designs
#'
#' Compares the QST of a single phenotypic trait to the mean FST of series of marker loci.
#' It calculates the distribution of QST - FST under a model assuming neutrality of both
#' the phenotypic trait and the genetic markers from which FST is estimated.
#' Returns the simulated estimates of Qst - Fst under neutrality following the procedure
#' described in Gilbert and Whitlock (2014) and Whitlock & Guillaume (2009). Also returns
#' the simulated estimates of Fst and Qst used to compute the null distribution.
#'
#' @title Compare Qst to Fst
#'
#' @param fst.dat A data frame with the first column indicating population of origin and
#' the following columns representing genotypes at loci; see the
#' README \url{https://github.com/kjgilbert/QstFstComp/blob/master/README.md} for further description.
#' If using AFLPs, this is a data frame of q-hat values, with pops in columns, loci in rows
#' and the corresponding q-hat variances in the following columns, and \code{AFLP=TRUE} must be designated.
#'
#' @param qst.dat the input table of the breeding data
#' \itemize{
#' \item For \code{breeding.design = "half.sib.sire"}: qst.dat should have four
#' columns in this order: population, sire, dam, and the trait value of
#' the individual. Each population, sire, and dam should have unique
#' names or numbers.
#' \item For \code{breeding.design = "half.sib.dam"}: qst.dat should have three
#' columns in this order: population, dam, and the trait value of
#' the individual. Each population and dam should have unique
#' names or numbers.
#' }
#'
#' @param numpops number of populations in the sample
#'
#' @param nsim number of simulation replicates to perform to create the null
#' distributions and bootstraps
#'
#' @param AFLP whether or not to use AFLP data
#'
#' @param breeding.design the breeding design used when collecting the trait data
#' There are two options for breeding design:
#' \enumerate{
#' \item "half.sib.sire" is a half sib design with dam nested within sire nested within
#' population which works for either balanced or unbalanced sampling designs
#' \item "half.sib.dam" is a half sib design with dam nested within population
#' which works for either balanced or unbalanced sampling designs
#' }
#'
#' @param dam.offspring.relatedness relatedness between offspring in the dam model, default is 1/4, i.e. half-sib
#'
#' @param output whether to output full, concise, or without writing out vector of resampled Q-F values, see details below
#'
#' @return
#'
#' Returns either a concise list of a subset of results or a full list with all possible results. Both output
#' options write the vector of Qst-Fst values to a text file unless "_nowrite" is appended to the option, e.g.
#' "concise_nowrite" returns only the concise output without writing output to a text file.
#'
#' Concise list returns (default)
#' \itemize{
#' \item the calculated difference between Qst and Fst with 95\% critical values,
#' \item one- and two- tailed p-values for this difference,
#' \item the Fst value estimated from the data with 95\% confidence intervals,
#' \item the Qst value estimated from the data with 95\% confidence intervals, and
#' \item the additive genetic variance for the trait with 95\% confidence intervals
#' }
#' Full list returns
#' \itemize{
#' \item the calculated difference between Qst and Fst with 95\% critical values,
#' \item one- and two- tailed p-values for this difference,
#' \item a list of all Qst-Fst values for plotting the distribution of Qst-Fst,
#' \item the Fst value estimated from the data with 95\% confidence intervals,
#' \item the resampled Fst as calculated from bootstrapping across simulations, with standard deviation and 95\% confidence intervals,
#' \item a list of all resampled Fst values for plotting the distribution of Fst,
#' \item the Qst value estimated from the data with 95\% confidence intervals,
#' \item the resampled neutral Qst as calculated from bootstrapping across simulations, with standard deviation and 95\% critical values,
#' \item a list of all resampled Qst values for plotting the distribution of the neutral Qst,
#' \item the ANOVA table for the calculated means squared, n coefficients, and degrees of freedom,
#' \item the additive genetic variance for the trait with 95\% confidence intervals, and
#' \item the coefficient of additive genetic variance for the trait with 95\% confidence intervals
#' }
#' Be mindful of the fact that with a small number of loci, bootstrapped confidence intervals of Fst can be less accurate.
#'
#' @author Kimberly J Gilbert & Michael C Whitlock
#'
#' @import hierfstat
#'
#' @references Gilbert KJ and MC Whitlock (2015) \href{http://onlinelibrary.wiley.com/doi/10.1111/1755-0998.12303/abstract}{\emph{Qst} \emph{Fst} comparisons with unbalanced half-sib designs.} \emph{Molecular Ecology Resources}, 15(2), 262-267.
#'
#' @references Whitlock MC and F Guillaume (2009) \href{http://www.genetics.org/content/183/3/1055}{Testing for spatially divergent selection: Comparing \emph{Qst} to \emph{Fst}.} \emph{Genetics}, 183:1055-1063.
#'
#'
#'
#' @examples
#'
#' ## using balanced half-sib sire trait data and biallelic marker data
#' data(hssire) # trait data
#' data(biallelic) # marker data
#' QstFstComp(biallelic, hssire, numpops=15, nsim=100, breeding.design="half.sib.sire", output="full")
#'
#' data(hsdam)
#' data(aflp)
#' QstFstComp(aflp, hsdam, numpops=15, nsim=100, AFLP=TRUE, breeding.design="half.sib.dam", output="concise")
#'
#'
#' @export
QstFstComp <- function(fst.dat, qst.dat, numpops, nsim=1000, AFLP=FALSE, breeding.design, dam.offspring.relatedness=0.25, output="concise")
{
if(missing(fst.dat)) stop("Genotypic data must be provided.")
if(missing(qst.dat)) stop("Phenotypic data must be provided.")
if(missing(numpops)) stop("Number of populations must be defined.")
if(missing(breeding.design)) stop("Breeding design must be defined.")
if(breeding.design=="half.sib.sire"){
qst.MS <- MeanSq.unbalanced.sire(qst.dat)
qst.temp <- QSTfromSireModel(qst.MS$MSpops, qst.MS$MSsires, qst.MS$MSdams, qst.MS$MSwithin,
qst.MS$n0primeprime, qst.MS$n0prime, qst.MS$n0, qst.MS$nc0prime, qst.MS$nc0, qst.MS$ncb0)
qst.obs <- qst.temp$Qst
mean.trait.value <- mean(qst.dat[,4], na.rm=TRUE) # this takes the mean of all trait values across all ind.s, no bootstrapping
Va <- qst.temp$Va
if(Va < 0){Va <- 0} # if Va includes negative values, the distribution is truncated to zero
CVa <- sqrt(Va)/mean.trait.value * 100
}
if(breeding.design=="half.sib.dam"){
qst.MS <- MeanSq.unbalanced.dam(qst.dat)
qst.obs <- QSTfromDamModel(qst.MS$MSpops,qst.MS$MSdams,qst.MS$MSwithin,qst.MS$n0prime,qst.MS$n0,qst.MS$nb0,
dam.offspring.relatedness)
mean.trait.value <- mean(qst.dat[,3], na.rm=TRUE) # this takes the mean of all trait values across all ind.s, no bootstrapping
Va <- 1/dam.offspring.relatedness*(qst.MS$MSdams-qst.MS$MSwithin)/qst.MS$n0
if(Va < 0){Va <- 0} # if Va includes negative values, the distribution is truncated to zero
CVa <- sqrt(Va)/mean.trait.value * 100
}
# for non-AFLP markers, i.e. non-dominant markers
if(AFLP==FALSE){
nloci <- ncol(fst.dat)-1
abc.mat <- wc.calc(fst.dat, diploid=TRUE)
nalleles <- nrow(abc.mat)
#the observed Fst:
fst.obs <- sum(abc.mat[,1])/sum(abc.mat[,1] + abc.mat[,2] + abc.mat[,3])
}
# for AFLP markers, i.e. dominant markers
if(AFLP==TRUE){
# get the data for the Fst calculation and resampling
full.fst.dat <- read.fst.input(fst.dat, num.pops=numpops)
fst.data <- as.matrix(full.fst.dat [[1]]) # this is a matrix of the q_hat values
var.dat <- as.matrix(full.fst.dat [[2]]) # this is a matrix of the variances
nloci <- nrow(fst.data)
#the observed Fst:
fst.obs <- mean.fst(dat=fst.data, vardat=var.dat, num.loci=nloci, num.pops=numpops)
}
# values to be filled in across replicate simulations
sim.est <- vector(length = nsim)
qst.neut <- vector(length = nsim)
fst.est <- vector(length = nsim)
qstForCI.est <- vector(length = nsim)
Va.est <- vector(length = nsim)
CVa.est <- vector(length = nsim)
#simulating
for(i in 1:nsim) {
#1. Fst simulation replicate; sample nloci from the neutral markers, with replacement
if(AFLP==FALSE){ fst.repl <- fst.sample(abc.mat, nalleles) }
if(AFLP==TRUE){ fst.repl <- fst.sample.aflp(fst.data, var.dat, nloci) }
#2. get a simulated replicate of Qst by sampling the null distribution
if(breeding.design=="half.sib.sire") qst.repl <- qst.parboot.siremodel(qst.MS, fst.obs)[[1]] # this function now spits out Qst and Va, just need the Qst here
if(breeding.design=="half.sib.dam") qst.repl <- qst.parboot.dammodel(qst.MS, fst.obs, dam.offspring.relatedness)
#3. get the simulated replicate of Qst - Fst, and store it
sim.est[i] <- qst.repl - fst.repl
#store values
fst.est[i] <- fst.repl
qst.neut[i] <- qst.repl
#bootstraps for Qst and Va
if(breeding.design=="half.sib.sire") { temp <- qstVa.parbootForCI.siremodel(qst.MS) }
if(breeding.design=="half.sib.dam") { temp <- qstVa.parbootForCI.dammodel(qst.MS, dam.offspring.relatedness) }
qstForCI.est[i] <- temp[1]
Va.est[i] <- temp[2]
if(Va.est[i] < 0){Va.est[i] <- 0} # if Va includes negative values, the distribution is truncated to zero
CVa.est[i] <- sqrt(Va.est[i])/mean.trait.value * 100
}
# output the vector of resampled Q-F values; can be used for plotting the distribution
# file is output to working directory and named from the timestamp
time.stamp <- Sys.time()
formatted.time.stamp <- gsub(" ", "_", time.stamp)
formatted.time.stamp <- gsub(":", "-", formatted.time.stamp)
if(output == "concise_nowrite" || output == "full_nowrite"){
print("No output file of Q minus F values written.")
}else{
writeLines(as.character(sim.est), paste("QminusFvalues_", formatted.time.stamp, ".txt", sep=""))
}
# calculate the p-value for the difference between the neutral qst and fst
diff.repl <- qst.neut - fst.est
Q.obsMinusF.obs <- qst.obs-fst.obs
right.one.tailed.p <- sum(Q.obsMinusF.obs < diff.repl)/length(diff.repl) # right hand 1-tailed p value
left.one.tailed.p <- sum(Q.obsMinusF.obs > diff.repl)/length(diff.repl) # left hand 1-tailed p value
two.tailed.p <- 2*min(right.one.tailed.p, left.one.tailed.p)
## Create return items for either the concise or the full list of outputs, as specified from input parameters
if(output=="concise" || output=="concise_nowrite"){return(list(
QminusF <- c(
"Calculated Qst-Fst" = Q.obsMinusF.obs,
"Lower Bound crit. value" = quantile(sim.est,0.025,na.rm=TRUE),
"Upper bound crit. value" = quantile(sim.est,0.975,na.rm=TRUE)),
Distribution <- paste("Qst-Fst values output to file:",
paste(getwd(), "/QminusFvalues_", formatted.time.stamp, ".txt", sep="")),
QminusF.p.values <- c(
"Lower one-tailed p value" = left.one.tailed.p,
"Upper one-tailed p value" = right.one.tailed.p,
"Two-tailed p value" = two.tailed.p),
Fst <- c("Estimated Fst" = fst.obs,
"Lower Bound CI" = quantile(fst.est,0.025,na.rm=TRUE),
"Upper bound CI" = quantile(fst.est,0.975,na.rm=TRUE)),
Qst <- c("Estimated Qst" = qst.obs,
"Lower Bound CI" = quantile(qstForCI.est,0.025,na.rm=TRUE),
"Upper bound CI" = quantile(qstForCI.est,0.975,na.rm=TRUE)),
Va <- c("Va"= Va,
"Lower bound CI" = quantile(Va.est,0.025,na.rm=TRUE),
"Upper bound CI" = quantile(Va.est,0.975,na.rm=TRUE))
))
} # end return of concise list
if(output=="full" || output=="full_nowrite"){
if(breeding.design=="half.sib.sire"){ # this full output has the ANOVA table for the half-sib sire breeding design
return(list(
QminusF <- c(
"Calculated Qst-Fst" = Q.obsMinusF.obs,
"Lower Bound crit. value" = quantile(sim.est,0.025,na.rm=TRUE),
"Upper bound crit. value" = quantile(sim.est,0.975,na.rm=TRUE)),
Distribution <- paste("Qst-Fst values output to file:",
paste(getwd(), "/QminusFvalues_", formatted.time.stamp, ".txt", sep="")),
# Distribution.QminusF <- sim.est,
# uncomment the above line if you want to output a vector of the distribution of Q-F values to the console
QminusF.p.values <- c(
"Lower one-tailed p value" = left.one.tailed.p,
"Upper one-tailed p value" = right.one.tailed.p,
"Two-tailed p value" = two.tailed.p),
Fst <- c("Estimated Fst" = fst.obs,
"Lower Bound CI" = quantile(fst.est,0.025,na.rm=TRUE),
"Upper bound CI" = quantile(fst.est,0.975,na.rm=TRUE)),
Fst.resampled <- c(
"Fst Resampled" = mean(fst.est, na.rm=TRUE),
"Fst std. dev." = sd(fst.est, na.rm=TRUE),
"Fst 95% CI" = quantile(fst.est,c(0.025,0.975), na.rm=TRUE),
"Fst 99% CI" = quantile(fst.est,c(0.005,0.995), na.rm=TRUE)),
# Distribution.Fst.resampled <- fst.est,
# uncomment the above line if you want to output a vector of the distribution of resampled Fst values
Qst <- c("Estimated Qst" = qst.obs,
"Lower Bound CI" = quantile(qstForCI.est,0.025,na.rm=TRUE),
"Upper bound CI" = quantile(qstForCI.est,0.975,na.rm=TRUE)),
Qst.neutral <- c(
"Qst Resampled" = mean(qst.neut, na.rm=TRUE),
"Qst std. dev." = sd(qst.neut, na.rm=TRUE),
"Qst 95% crit. values" = quantile(qst.neut,c(0.025,0.975), na.rm=TRUE),
"Qst 99% crit. values" = quantile(qst.neut,c(0.005,0.995), na.rm=TRUE)),
# Distribution.Qst.neutral <- qst.neut,
# uncomment the above line if you want to output a vector of the distribution of resampled Qst values
ANOVA.table <- c(
"MS Pop" = qst.MS[[1]],
"MS Sire" = qst.MS[[2]],
"MS Dam" = qst.MS[[3]],
"MS Within" = qst.MS[[4]],
"n0primeprime" = qst.MS[[5]],
"n0prime" = qst.MS[[6]],
"n0" = qst.MS[[7]],
"nc0prime" = qst.MS[[8]],
"nc0" = qst.MS[[9]],
"ncb0" = qst.MS[[10]],
"df Pop" = qst.MS[[11]],
"df Sire" = qst.MS[[12]],
"df Dam" = qst.MS[[13]],
"df Within" = qst.MS[[14]],
"Sigma^2 sires" = qst.MS[[15]]),
Va <- c("Va" = Va,
"Lower bound CI" = quantile(Va.est,0.025,na.rm=TRUE),
"Upper bound CI" = quantile(Va.est,0.975,na.rm=TRUE)),
CVa <- c("CVa" <- CVa,
"Lower bound CI" = quantile(CVa.est,0.025,na.rm=TRUE),
"Upper bound CI" = quantile(CVa.est,0.975,na.rm=TRUE))
))} # end return of half.sib.sire full output
if(breeding.design=="half.sib.dam"){ # this full output has the ANOVA table for the half-sib dam breeding design
return(list(
QminusF <- c(
"Calculated Qst-Fst" = Q.obsMinusF.obs,
"Lower Bound crit. value" = quantile(sim.est,0.025,na.rm=TRUE),
"Upper bound crit. value" = quantile(sim.est,0.975,na.rm=TRUE)),
Distribution <- paste("Qst-Fst values output to file:",
paste(getwd(), "/QminusFvalues_", formatted.time.stamp, ".txt", sep="")),
# Distribution.QminusF <- sim.est,
# uncomment the above line if you want to output a vector of the distribution of Q-F values to the console
QminusF.p.values <- c(
"Lower one-tailed p value" = left.one.tailed.p,
"Upper one-tailed p value" = right.one.tailed.p,
"Two-tailed p value" = two.tailed.p),
Fst <- c("Estimated Fst" = fst.obs,
"Lower Bound CI" = quantile(fst.est,0.025,na.rm=TRUE),
"Upper bound CI" = quantile(fst.est,0.975,na.rm=TRUE)),
Fst.resampled <- c(
"Fst Resampled" = mean(fst.est, na.rm=TRUE),
"Fst std. dev." = sd(fst.est, na.rm=TRUE),
"Fst 95% CI" = quantile(fst.est,c(0.025,0.975), na.rm=TRUE),
"Fst 99% CI" = quantile(fst.est,c(0.005,0.995), na.rm=TRUE)),
# Distribution.Fst.resampled <- fst.est,
# uncomment the above line if you want to output a vector of the distribution of resampled Fst values
Qst <- c("Estimated Qst" = qst.obs,
"Lower Bound CI" = quantile(qstForCI.est,0.025,na.rm=TRUE),
"Upper bound CI" = quantile(qstForCI.est,0.975,na.rm=TRUE)),
Qst.neutral <- c(
"Qst Resampled" = mean(qst.neut, na.rm=TRUE),
"Qst std. dev." = sd(qst.neut, na.rm=TRUE),
"Qst 95% crit. values" = quantile(qst.neut,c(0.025,0.975), na.rm=TRUE),
"Qst 99% crit. values" = quantile(qst.neut,c(0.005,0.995), na.rm=TRUE)),
# Distribution.Qst.neutral <- qst.neut,
# uncomment the above line if you want to output a vector of the distribution of resampled Qst values
ANOVA.table <- c(
"MS Pop" = qst.MS[[1]],
"MS Dam" = qst.MS[[2]],
"MS Within" = qst.MS[[3]],
"n0prime" = qst.MS[[4]],
"n0" = qst.MS[[5]],
"nb0" = qst.MS[[6]],
"df Pop" = qst.MS[[7]],
"df Dam" = qst.MS[[8]],
"df Within" = qst.MS[[9]],
"Sigma^2 dams" = qst.MS[[10]]),
Va <- c("Va" = Va,
"Lower bound CI" = quantile(Va.est,0.025,na.rm=TRUE),
"Upper bound CI" = quantile(Va.est,0.975,na.rm=TRUE)),
CVa <- c("CVa" = CVa,
"Lower bound CI" = quantile(CVa.est,0.025,na.rm=TRUE),
"Upper bound CI" = quantile(CVa.est,0.975,na.rm=TRUE))
))} # end return of half.sib.dam full output
} # end return of full list
} # end main function
###########################################################################
###########################################################################
####### #######
####### additional functions required for main function above #######
####### #######
###########################################################################
################ MeanSq unbalanced sire model #####################
#
# Finding no and noprime values, as per Sokal and Rohlf p297, and also returning
# key elements of the anova, including MS.
# This will be expecting to receive a data frame with three columns of numbers:
# the first column labelled 'Population',
# the second giving dam names,
# the third giving sire names, and
# the fourth column is the trait values, with NA for any missing values.
# ** Sires and dams should be labelled distinctly (i.e. the same name refers to the same
# individual dam, without forcing reference to the population name)
MeanSq.unbalanced.sire <- function(dat){
poplist <- dat[,1]
sirelist <- dat[,2]
damlist <- dat[,3]
traitlist <- dat[,4]
#Remove all lines for which there is no data for the trait, to simplify calculations later.
pops <- poplist[!is.na(traitlist)]
sires <- sirelist[!is.na(traitlist)]
dams <- damlist[!is.na(traitlist)]
sires <- sirelist[!is.na(traitlist)]
nTable <- table(as.data.frame(cbind(pops,sires,dams)))
traits <- as.numeric(traitlist[!is.na(traitlist)])
dataTable <- cbind(as.data.frame(cbind(pops,sires,dams)),traits=I(traits))
nPops <- length(table(poplist))
nSiresTot <-length(table(sirelist))
nDamsTot <- length(table(damlist))
dfsires <- (nSiresTot-nPops)
dfpops <- nPops-1
dfdams <- nDamsTot-nSiresTot
dfwithin <- length(traits)-dfsires-dfdams-dfpops-1
##Calculating the n values (based on Sokal and Rohlf pp302ff)
#Quantity 1
q1 <- sum(nTable^2)
#Quantity 2
nTablePopSire <- table(as.data.frame(cbind(pops,sires)))
q2 <- sum(nTablePopSire^2)
#Quantity 3
nTablePop <- table(as.data.frame(cbind(pops)))
q3 <- sum(nTablePop^2)
#Quantity 4
nTot <- length(traits)
q4 <- nTot
#Quantity 5
nTableSireDams <- table(as.data.frame(cbind(sires,dams)))
q5 <-sum(rowSums(nTableSireDams^2)/rowSums(nTableSireDams))
#Quantity 6
nTablePopsDams <- table(as.data.frame(cbind(pops,dams)))
q6 <-sum(rowSums(nTablePopsDams^2)/rowSums(nTablePopsDams))
#Quantity 7
q7 <- sum(rowSums(nTablePopSire^2)/rowSums(nTablePopSire))
###Calculating n's
n0primeprime <- (q6-q1/q4)/dfpops
n0prime <- (q5-q6)/dfsires
n0 <- (q4-q5)/dfdams
nc0prime <- (q7 - (q2/q4))/dfpops
nc0 <- (q4-q7)/dfsires
ncb0 <- (q4-(q3/q4))/dfpops
#####Calculating SS
grandTotal <- sum(traits)
sumSqObs <- sum(traits^2)
#Sires
siresq <- tapply(dataTable$traits,dataTable$sires,FUN=sum)^2
siren <- table(dataTable$sires)
qsire <- sum(siresq/siren)
#Dams
damsq <- tapply(dataTable$traits,dataTable$dams,FUN=sum)^2
damn <- table(dataTable$dams)
qdam <- sum(damsq/damn)
#Pops
popsq <- tapply(dataTable$traits,dataTable$pops,sum)^2
popn <- table(dataTable$pops)
qpop <- sum(popsq/popn)
#CT
CT <- grandTotal^2/length(dataTable[,1])
#SS
SStotal <- sumSqObs-CT
SSpops <- qpop-CT
SSsires <- qsire-qpop
SSdams <- qdam-qsire
SSwithin <- sumSqObs-qdam
#MS
MSpops <- SSpops/dfpops
MSsires <- SSsires/dfsires
MSdams <- SSdams/dfdams
MSwithin <- SSwithin/dfwithin
sigma2dams <- (MSdams-MSwithin)/n0
sigma2sires <- (MSsires - MSwithin - n0prime*sigma2dams)/nc0
return(list(
MSpops = MSpops,
MSsires = MSsires,
MSdams = MSdams,
MSwithin = MSwithin,
n0primeprime = n0primeprime,
n0prime = n0prime,
n0 = n0,
nc0prime = nc0prime,
nc0 = nc0,
ncb0 = ncb0,
dfpops = dfpops,
dfsires = dfsires,
dfdams = dfdams,
dfwithin = dfwithin,
sigma2sires = sigma2sires,
sigma2dams = sigma2dams
))
}
###########################################################################
################ MeanSq unbalanced dam model #####################
#
# Finding n0 and n0prime values, as per Sokal and Rohlf p297, and also returning
# key elements of the anova, including MS.
# This will be expecting to receive a data frame with three columns of numbers:
# the first column labelled 'Population',
# the second giving dam names, and
# the third column is the trait values, with NA for any missing values.
# ** Dams should be labelled distinctly (i.e. the same name refers to the same
# individual dam, without forcing reference to the population name)
MeanSq.unbalanced.dam <- function(dat){
poplist <- dat[,1]
damlist <- dat[,2]
traitlist <- dat[,3]
#Remove all lines for which there is no data for the trait, to simplify calculations later.
pops <- poplist[!is.na(traitlist)]
dams <- damlist[!is.na(traitlist)]
nTable <- table(as.data.frame(cbind(pops,dams)))
traits <- as.numeric(traitlist[!is.na(traitlist)])
dataTable <- cbind(as.data.frame(cbind(pops,dams)),traits=I(traits))
nPops <- length(table(poplist))
nDamsTot <- length(table(damlist))
dfdam <- (nDamsTot-nPops)
dfpops <- nPops-1
dfwithin <- length(traits)-dfdam-dfpops-1
##Calculating the n values
#Quantity 1
nTot <- length(traits)
#Quantity 2
nsqTot <- sum(nTable^2)
#Quantity 3
sumnsq <- sum(rowSums(nTable)^2)
#Quantity 4
eq4 <- sum(rowSums(nTable^2)/rowSums(nTable))
n0prime <- (eq4-nsqTot/nTot)/(nPops-1)
n0 <- (nTot - eq4)/dfdam
nb0 <- (nTot-sumnsq/nTot)/(nPops-1)
#Calculating SS
grandTotal <- sum(traits)
sumSqObs <- sum(traits^2)
#Quantity 3 for SS
damsq <- tapply(dataTable$traits,dataTable$dams,FUN=sum)^2
damn <- table(dataTable$dams)
q3 <- sum(damsq/damn)
#Quantity 4 for SS
popsq <- tapply(dataTable$traits,dataTable$pops,sum)^2
popn <- table(dataTable$pops)
q4 <- sum(popsq/popn)
#CT
CT <- grandTotal^2/length(dataTable[,1])
#SS
SStotal <- sumSqObs-CT
SSpops <- q4-CT
SSdams <- q3-q4
SSwithin <- sumSqObs-q3
#MS
MSpops <- SSpops/dfpops
MSdams <- SSdams/dfdam
MSwithin <- SSwithin/dfwithin
return(list(
MSpops = MSpops,
MSdams = MSdams,
MSwithin = MSwithin,
n0prime = n0prime,
n0 = n0,
nb0 = nb0,
dfpops = dfpops,
dfdam = dfdam,
dfwithin = dfwithin,
sigma2dams = (MSdams-MSwithin)/n0
))
}
###########################################################################
######################## Qst.sireModel #############################
#
# Calculating Qst from the sire halfsib model, with arbitrary numbers of dams per sire and offspring per dam
QSTfromSireModel <- function(MSpops, MSsires, MSdams, MSwithin, n0primeprime, n0prime, n0, nc0prime, nc0, ncb0){
sigma2dams <- (MSdams-MSwithin)/n0
sigma2sires <- (MSsires - MSwithin - n0prime*sigma2dams)/nc0
sigma2pops <- (MSpops - MSwithin - n0primeprime*sigma2dams - nc0prime*sigma2sires)/ncb0
Va <- 4*sigma2sires
Qst <- sigma2pops/(sigma2pops+2*Va)
return(list(Qst = Qst, Va = Va))
}
###########################################################################
######################## Qst.damModel #############################
#
# Calculating Qst from the dam halfsib model, assuming that the offpsring of dams are all by unique fathers
QSTfromDamModel <- function(MSpops,MSdams,MSwithin,n0prime,n0,nb0, dam.offspring.relatedness){
sigma2dams <- (MSdams-MSwithin)/n0
sigma2pops <- (MSpops-sigma2dams*n0prime-MSwithin)/nb0
VA <- 1/dam.offspring.relatedness*sigma2dams
Qst <- sigma2pops/(sigma2pops+2*VA)
return(Qst)
}
###########################################################################
############################## Qst parboot sire model ######################################
#
# New qst.parboot for unbalanced model (collapses to balanced model as well) returns a Qst pseudovalue for that simulated null (neutral)
# distribution based on half-sib sire model. This requires MS for sires, dams and residuals,
# along with the various n's from the nested ANOVA and dfs for each level (These are all encapsulated in the results list from MeanSq.unbalanced.sire.)
qst.parboot.siremodel <- function(MSdflist, meanFst){
MSsiresResample <- MSdflist$MSsires * rchisq(1,MSdflist$dfsires) / MSdflist$dfsires
MSdamsResample <- MSdflist$MSdams * rchisq(1,MSdflist$dfdam) / MSdflist$dfdam
MSwithinResample <- MSdflist$MSwithin * rchisq(1,MSdflist$dfwithin) / MSdflist$dfwithin
VpopNeut <- 8*meanFst / (1-meanFst) * MSdflist$sigma2sires
MSpopNeut <- MSdflist$MSwithin + MSdflist$n0primeprime * MSdflist$sigma2dams + MSdflist$nc0prime * MSdflist$sigma2sires + MSdflist$ncb0 * VpopNeut
MSpopNeutResample <- MSpopNeut * rchisq(1,MSdflist$dfpops) / MSdflist$dfpops
sim.qst <- QSTfromSireModel(MSpopNeutResample,MSsiresResample,MSdamsResample,MSwithinResample,MSdflist$n0primeprime,MSdflist$n0prime,MSdflist$n0,MSdflist$nc0prime,MSdflist$nc0,MSdflist$ncb0)
return(sim.qst)
}
###########################################################################
############################## Qst parboot dam model ######################################
# New qst.parboot for dam model returns a Qst pseudovalue for that simulated null
# distribution based on half-sib dam model. This requires MS for dams and residuals,
# along with the n0 value from the nested ANOVA and dfs for each level
qst.parboot.dammodel <- function(MSdflist, meanFst, dam.offspring.relatedness){
MSdamsResample <- MSdflist$MSdams * rchisq(1,MSdflist$dfdam) / MSdflist$dfdam
MSwithinResample <- MSdflist$MSwithin * rchisq(1,MSdflist$dfwithin) / MSdflist$dfwithin
VpopNeut <- 2/dam.offspring.relatedness*meanFst / (1-meanFst) * MSdflist$sigma2dams
MSpopNeut <- MSdflist$MSwithin + MSdflist$n0prime * MSdflist$sigma2dams + MSdflist$nb0 * VpopNeut
MSpopNeutResample <- MSpopNeut * rchisq(1,MSdflist$dfpops) / MSdflist$dfpops
sim.qst <- QSTfromDamModel(MSpopNeutResample,MSdamsResample,MSwithinResample,MSdflist$n0prime,MSdflist$n0,MSdflist$nb0, dam.offspring.relatedness)
return(sim.qst)
}
###########################################################################
######################### QstVa Parboot for sire model ###############################
#
#New qstVa.parbootForCI.siremodel for sire model returns a Qst pseudovalue
#resampled by a paremetric bootstrap from the sire model. This follows the logic
#of O'Hara and Merila. It also returns a parametric bootrap value for Va. This
#requires MS for sires, dams and residuals, along with the n's and dfs for each level
qstVa.parbootForCI.siremodel <- function(MSdflist){
MSsiresResample <- MSdflist$MSsires * rchisq(1,MSdflist$dfsires) / MSdflist$dfsires
MSdamsResample <- MSdflist$MSdams * rchisq(1,MSdflist$dfdam) / MSdflist$dfdam
MSwithinResample <- MSdflist$MSwithin * rchisq(1,MSdflist$dfwithin) / MSdflist$dfwithin
MSpopResample <- MSdflist$MSpops * rchisq(1,MSdflist$dfpops) / MSdflist$dfpops
sim.qstForCI <- QSTfromSireModel(MSpopResample,MSsiresResample,MSdamsResample,MSwithinResample,MSdflist$n0primeprime,MSdflist$n0prime,MSdflist$n0,MSdflist$nc0prime,MSdflist$nc0,MSdflist$ncb0)
sim.qstForCI <- sim.qstForCI[[1]] # only want to return the Qst here, not also the Va
sigma2damsResample <- (MSdamsResample-MSwithinResample)/MSdflist$n0
sigma2siresResample <- (MSsiresResample - MSwithinResample - MSdflist$n0prime*sigma2damsResample)/MSdflist$nc0
sim.Va <- 4*sigma2siresResample
return(c(sim.qstForCI, sim.Va))
}
###########################################################################
######################### QstVa Parboot for dam model ###############################
#
# New qstVa.parbootForCI.dammodel for dam model returns a Qst pseudovalue
# resampled by a paremetric bootstrap from the dam model. This follows the logic
# of O'Hara and Merila. It also returns a parametric bootrap value for Va. This
# requires MS for dams and residuals, along with the n0 value from the nested
# ANOVA and dfs for each level
qstVa.parbootForCI.dammodel <- function(MSdflist, dam.offspring.relatedness){
MSdamsResample <- MSdflist$MSdams * rchisq(1,MSdflist$dfdam) / MSdflist$dfdam
MSwithinResample <- MSdflist$MSwithin * rchisq(1,MSdflist$dfwithin) / MSdflist$dfwithin
MSpopResample <- MSdflist$MSpops * rchisq(1,MSdflist$dfpops) / MSdflist$dfpops
sim.qstForCI <- QSTfromDamModel(MSpopResample,MSdamsResample,MSwithinResample,MSdflist$n0prime,MSdflist$n0,MSdflist$nb0, dam.offspring.relatedness)
sim.Va <- 1/dam.offspring.relatedness*(MSdamsResample-MSwithinResample)/MSdflist$n0
return(c(sim.qstForCI, sim.Va))
}
###########################################################################
######################### wc.calc #########################
#
# returns the a, b, and c values required to compute Fst according to Weir and Cockerham 1984
# modified from code previously implemented in hierfstat package
# (Jerome Goudet, http://cran.r-project.org/web/packages/hierfstat/index.html)
#
# ndat data frame with first column indicating population of origin and following representing loci
# diploid Whether data are diploid
wc.calc <- function(ndat, diploid = TRUE){
if (!diploid){
dum <- ndat[, -1]
nd <- max(dum, na.rm = TRUE)
modu <- 1000
if(nd < 10) modu <- 10
if(nd < 100) modu <- 100
dum <- dum * modu + dum
ndat <- data.frame(ndat[, 1], dum)
}
pop <- ndat[, 1]
ni <- length(pop)
dat <- ndat
loc.names <- names(dat)[-1]
n <- t(ind.count(dat)) ## NEED in.count to give number of inds genotyped per locus and per pop
nt <- apply(n, 1, sum, na.rm = TRUE)
untyped.loc <- which(nt == 0)
typed.loc <- which(nt != 0)
if(length(untyped.loc) > 0){
dat <- dat[, -(untyped.loc + 1)]
n <- t(ind.count(dat))
nt <- apply(n, 1, sum, na.rm = TRUE)
}
alploc <- nb.alleles(cbind(rep(1, ni), dat[, -1]))
np <- dim(n)[2]
npl <- apply(n, 1, tempfun <- function(x) sum(!is.na(x)))
nl <- dim(n)[1]
p <- pop.freq(dat, diploid)
pb <- pop.freq(cbind(rep(1, length(pop)), dat[, -1]), diploid)
n <- matrix(unlist(n), ncol = np)
nal <- n[rep(1:nl, alploc), ]
nc <- (nt - apply(n^2, 1, sum, na.rm = TRUE)/nt)/(npl - 1)
ntal <- rep(nt, alploc)
ncal <- rep(nc, alploc)
p <- matrix(unlist(lapply(p, t)), ncol = np, byrow = TRUE)
pb <- matrix(unlist(pb), ncol = 1)
if(diploid){
dum <- getal.b(dat[, -1])
all.loc <- apply(dum, 2, tempfun1 <- function(y) as.numeric(dimnames(table(y))[[1]]))
hetpl <- apply(dum, 2, fun <- function(z) {
lapply(as.numeric(dimnames(table(z))[[1]]), who.is.het <- function(y) apply(z ==
y, 1, ind.is.het <- function(x) xor(x[1], x[2])))
})
mho <- lapply(hetpl, tempfun2 <- function(x) matrix(unlist(lapply(x,
tempfun3 <- function(y) tapply(y, pop, sum, na.rm = TRUE))),
ncol = np))
mho <- matrix(unlist(mho), ncol = np, byrow = TRUE)
mhom <- (2 * nal * p - mho)/2
}else{mhom <- nal * p}
SSG <- apply(nal * p - mhom, 1, sum, na.rm = TRUE)
dum <- nal * (p - 2 * p^2) + mhom
SSi <- apply(dum, 1, sum, na.rm = TRUE)
dum1 <- nal * (sweep(p, 1, pb))^2
SSP <- 2 * apply(dum1, 1, sum, na.rm = TRUE)
ntalb <- rep(npl, alploc)
MSG <- SSG/ntal
MSP <- SSP/(ntalb - 1)
MSI <- SSi/(ntal - ntalb)
sigw <- MSG
sigb <- 0.5 * (MSI - MSG)
siga <- 1/2/ncal * (MSP - MSI)
abc.mat <- cbind(siga, sigb, sigw)
# this returns a matrix of a, b, and c values in columns with one allele per row
return(abc.mat)
}
###########################################################################
######################### fst.sample #########################
#
# returns the Fst value computed on a sample of the alleles in the data
# @param:
# - obs: a table containing the components of variance for each locus
# the table must have one line per allele and at least 3 columns corresponding
# to the three coefficient a, b, and c as defined in Weir&Cockerham 1984
#
# - nalleles: the size of the sample (i.e. num of alleles to draw from the table)
fst.sample <- function(obs, nalleles) {
# choose the alleles to randomly sample:
allele.smpl <- sample(1: nalleles,size= nalleles,replace=TRUE)
#select the sampled alleles from the input table:
dat <- obs[allele.smpl,]
# Fst = a/(a+b+c); from Weir & Cockerham 1984
return( sum(dat[,1])/sum(dat[,1]+dat[,2]+dat[,3]) )
}
###########################################################################
######################### read.fst.input #########################
#
#function takes from user: filename, number of pops, the number of any
# extra columns in front (default is 1), and if there is a header row (default is yes)
#data must be in the form: n columns, columns of all q_hat values by pop and in order, columns
# of all q_hat variances by pop in order, any additional columns all saved in a .csv
read.fst.input <- function(q_hat.dat, num.pops, num.extra.columns=1){
num.pops <- num.pops
# which columns are the q_hats per population
q.columns <- num.extra.columns + (1:num.pops)
# which columns are the corresponding variances in q_hat per population
last.q.column <- tail(q.columns, n=1)
first.var.column <- last.q.column+1
var.columns <- first.var.column:(first.var.column+num.pops-1)
# extract the data from input file
dat <- q_hat.dat
q_hat.matrix <- dat[,q.columns[1]:tail(q.columns, n=1)]
var.q_hat.matrix <- dat[,var.columns[1]:tail(var.columns, n=1)]
return(list(
q_hat.matrix,
var.q_hat.matrix))
# this returns a list containing the 2 matrices
# assign results of the function to an object, then query [1] or [2]
# for the q_hat.matrix and variance.matrix respectively
}
###########################################################################
######################### mean.fst #########################
#
# calculate the mean fst from q_hat values for one sample
# this function will then be used in the fst.sample function
# necessary calculations come from Lynch and Milligan 1994, as implemented in AFLP-SURV (Vekemans et al. 2002)
#
# - dat: a MATRIX of q_hat values per population in columns, per locus in rows (can be produced with AFLP-SURV)
#
# - vardat: a MATRIX of the variance in q_hat, corresponding to the values in dat (can be produced with AFLP-SURV)
#
# - num.loci: number of loci individuals are genotyped at
#
# - num.pops: number of populations sampled
mean.fst <- function(dat, vardat, num.loci, num.pops){
##### calculate H_j(i)
H_j_i.matrix <- matrix(NA, nrow=num.loci, ncol=num.pops)
# H_j(i) = 2q_j(i)[1-q_j(i)] + 2Var[q_j(i)]
# j is one population, i is one locus
for(j in 1:num.pops){ # loop through population columns
for(i in 1:num.loci){ # loop through loci rows
q <- dat[i,j] # q value of one locus at one population
#x <- xdat[i,j] # x value of one locus at one population
#N <- samps[i,j] # sample size for one locus at one population
var.q <- vardat[i,j] # variance in q for that locus across all populations
H_j_i <- 2*q*(1-q) + 2*var.q # measure of gene diversity at one locus
H_j_i.matrix[i,j] <- H_j_i # put it into a matrix
}
}
##### calculate H_j
# H_j = average of H_j(i) across all loci for that population
H_j.matrix <- matrix(NA, nrow=1, ncol=num.pops)
for(j in 1:num.pops){ # get the H_j for each pop
H_j.temp <- mean(H_j_i.matrix[,j]) # sum over all loci for one pop and divide by number of loci
# done by taking mean per pop
H_j.matrix[1,j] <- H_j.temp
}
H_j <- H_j.matrix # this is the mean observed gene diversity in each pop, j
##### calculate H_jk
H_jk <- matrix(0, nrow=num.pops, ncol=num.pops)
# make a matrix of num.pops x num.pops (j by k)
for(j in 1:num.pops){
for(k in 1:num.pops){
sum.H_jki <- 0
for(i in 1:num.loci){
# first need H'_jk(i) where j and k are a pair of populations, eqn 9a
Hprime_jki <- dat[i,j] + dat[i,k] - 2*dat[i,j]*dat[i,k]
# then calculate H_jk(i) (i.e. not 'prime'), eqn 10a
H_jki <- Hprime_jki - (H_j_i.matrix[i,j] + H_j_i.matrix[i,k])/2
# sum this across all loci in this loop
sum.H_jki <- sum.H_jki + H_jki
}
# then divide by number of loci to get the H_jk for that pair of pops (eqn 12), and store it in a matrix
H_jk[j,k] <- sum.H_jki/num.loci
}
}
##### calculate H_B, mean between-pop gene diversity
# H_B = 2/(n(n-1)) * sum(H_jk) where n is the number of populations
first.half.distinct.list <- NULL
second.half.distinct.list <- NULL
H_jk.distinct <- NULL
for(a in 1:(num.pops-1)){
b <- a+1
pair <- expand.grid(a, b:num.pops)
list5 <- pair[,1]
list6 <- pair[,2]
first.half.distinct.list <- c(first.half.distinct.list, list5)
second.half.distinct.list <- c(second.half.distinct.list, list6)
} # this made 2 lists that when lined up are all possible distinct pairs of populations
# extract all the H_jk values of distinct population pairs using these lists
for(z in 1:length(first.half.distinct.list)){
H_jk.distinct.temp <- H_jk[first.half.distinct.list[z], second.half.distinct.list[z]]
# print(c(first.half.distinct.list[z], second.half.distinct.list[z])) # check only getting distinct pop pairs, yes.
H_jk.distinct <- c(H_jk.distinct, H_jk.distinct.temp)
} # now do eqn 13a
H_B <- (2 / (num.pops*(num.pops - 1))) * sum(H_jk.distinct)
##### calculate H_W, mean within-pop diversity
# H_W = 1/n * sum(H_j) where n is the number of populations
H_W <- (1/num.pops)*sum(H_j)
##### calculate H_T
# H_T = H_B + H_W
H_T <- H_B + H_W
##### calculate Var(H_W)
var.H_W <- (1 / (num.pops*(num.pops - 1))) * sum( (H_j - H_W)^2 )
##### calculate V_B
# modified from Xavier Vekemans C code in AFLP-SURV, found in the Isomain.c file
# V_B is the variance among diversity measures in non-overlapping pairs of populations - the variance in H_jk
count <- 0
sum.H_jk <- 0
for(k in 1:(num.pops/2)*2){ # a sequence to a decimal only goes to the rounded down decimal, so if odd num.pops still works
j <- k-1
temp.H_jk <- H_jk[j,k]
# sum the non-overlapping pop pairs H_jk's
sum.H_jk <- sum.H_jk + temp.H_jk
count <- count+1
# print(c(j,k))
}
mean.H_jk <- sum.H_jk / count # mean non-overlapping pop H_jk
sum.diffs.squared <- 0
for(k in 1:(num.pops/2)*2){ # a sequence to a decimal only goes to the rounded down decimal, so if odd num.pops still works
j <- k-1
temp.diff.squared <- (H_jk[j,k] - mean.H_jk)^2
# sum the squared differences to get the variance
sum.diffs.squared <- sum.diffs.squared + temp.diff.squared
}
V_B <- sum.diffs.squared / ((count) * (count-1))
if(num.pops <=3){V_B <- 0} # for 2 and 3 deme cases, V_B and C_B are set to zero
##### calculate C_B
# modified from Xavier Vekemans C code in AFLP-SURV, found in the Isomain.c file
# C_B is the covariance among diversity measures in overlapping population pairs - the covariance in H_jk
# covariance is sum over all x minus x_bar times y minus y_bar all divided by n-1
# overlapping pops are those that share either of the pops in the pair
count <- 0
sum.H_jk <- 0
sum.H_j2k2 <- 0
for(j in 1:(num.pops-1)){ # loop through pop 1 of X pair
for(k in (j+1):num.pops){ # loop through pop 2 of X pair
# get first set of Y pair of populations, those which share pop 1 of X
j2 <- j # this is first pop of Y pair, shared with first pop of X pair
k2 <- k+1 # this is second pair of Y pop
while(k2 <= num.pops){ # loop through first set, second pair of Y pop
X.H_jk <- H_jk[j, k]
sum.H_jk <- sum.H_jk + X.H_jk # get the mean of X pairs
Y.H_jk <- H_jk[j2, k2]
sum.H_j2k2 <- sum.H_j2k2 + Y.H_jk # get the mean of Y pairs
# print(c(j,k," ",j2,k2))
k2 <- k2+1
count <- count + 1
}
# get second set of Y pair of populations, those which share pop 2 of X
j2 <- k # this is first pop of Y pair, shared with second pop of X pair
k2 <- k+1 # this is second pair of Y pop
while(k2 <= num.pops){ # loop through second set, second pair of Y pop
X.H_jk <- H_jk[j, k]
sum.H_jk <- sum.H_jk + X.H_jk
Y.H_jk <- H_jk[j2, k2]
sum.H_j2k2 <- sum.H_j2k2 + Y.H_jk
# print(c(j,k," ",j2,k2))
k2 <- k2+1
count <- count + 1
}
}
}
mean.H_jk <- sum.H_jk/count
mean.H_j2k2 <- sum.H_j2k2/count
cov <- 0
# do the same as above but now get the covariance, i.e subtracting the mean from each and multiplying together
for(j in 1:(num.pops-1)){
for(k in (j+1):num.pops){
j2 <- j
k2 <- k+1
while(k2 <= num.pops){
cov.temp <- (H_jk[j, k] - mean.H_jk) * (H_jk[j2, k2] - mean.H_j2k2)
cov <- cov + cov.temp
k2 <- k2+1
}
j2 <- k
k2 <- k+1
while(k2 <= num.pops){
cov.temp <- (H_jk[j, k] - mean.H_jk) * (H_jk[j2, k2] - mean.H_j2k2)
cov <- cov + cov.temp
k2 <- k2+1
}
}
}
C_B <- cov / ((count) * (count-1))
if(num.pops <=3){C_B <- 0} # for 2 and 3 deme cases, V_B and C_B are set to zero
##### calculate Var(H_B)
var.H_B <- (2 * (V_B + 2*(num.pops-2)*C_B)) / (num.pops * (num.pops - 1))
##### calculate Cov(H_B, H_W)
# need the sum of the sums term in there - sum of all H_jk per population j (except where j=k) times H_j
# multiply the sum of each row of the H_jk matrix by its corresponding H_j from the H_j.matrix excluding pops where j=k in that sum
sum.H_j.H_jk <- 0
for(j in 1:num.pops){
# sum the row of the H_jk matrix, then subtract off the H_jk where j=k
temp.sum.H_jk <- 0
for(k in 1:num.pops){
if(j==k) same.jk <- H_jk[j,k] #this will be the value to subtract off the sum of H_jk for that row
temp.sum.H_jk <- temp.sum.H_jk + H_jk[j,k]
# print(c(j,k))
}
sum.H_jk <- temp.sum.H_jk - same.jk
temp.H_j.H_jk <- H_j[j] * sum.H_jk
sum.H_j.H_jk <- sum.H_j.H_jk + temp.H_j.H_jk
}
cov.H_B.H_W <- ( (sum.H_j.H_jk/(num.pops*(num.pops-1)))-(H_W*H_B) )/num.pops
##### Fst!
mean.Fst <- (H_B / H_T) * (1 + (H_B*var.H_W - H_W*var.H_B + (H_B - H_W)*cov.H_B.H_W) / (H_B*H_T^2 ))^(-1)
return(mean.Fst)
}
###########################################################################
######################### fst.sample.aflp ########################
#
# returns the Fst value computed on a sample of the loci given in input as q_hat values
# @param:
# - obs: a table containing the q_hat values needed to calculate Fst
#
# - qvar: the corresponding variances for given q_hat values
#
# - nloci: the size of the sample (i.e. num of loci to draw from the table)
fst.sample.aflp <- function(obs, qvar, nloci) {
# take a random sample of loci, with replacement, of the specified size:
loc.smpl <- sample(1:nloci,size=nloci,replace=TRUE)
# select the sampled loci from the input table of data:
q_hats <- obs[loc.smpl,]
# using this number of loci,
num.loci <- length(obs[,1])
# select the corresponding variances
vars <- qvar[loc.smpl,]
# and lastly need number of pops
num.pops <- length(obs[1,])
# calculate Fst based on function built in mean.fst
return( mean.fst(q_hats, vars, num.loci, num.pops) )
}
###########################################################################
NULL
#' AFLP marker dataset
#'
#' An example dataset created from simulations of 15 neutral populations under
#' an island model. Individuals are diploid with genotypes at 100 dominant, AFLP loci.
#'
#' \describe{
#' \item{pop}{population of origin}
#' \item{q_hat}{columns 2 through 16 are the q_hat values for each locus in each population}
#' \item{q_var}{columns 17 through 31 are the q_hat variances for each locus in each population}
#' }
#'
#' @docType data
#' @keywords datasets
#' @format A data frame with 100 observations on the following 31 variables.
#' @name aflp
NULL
#' Biallelic marker dataset
#'
#' An example dataset created from simulations of 15 neutral populations under
#' an island model. Individuals are diploid with genotypes at 25 biallelic loci.
#'
#' \describe{
#' \item{pop}{population of origin}
#' \item{loc}{loc1 through loc25 are the 25 biallelic loci for which each diploid individual is genotyped}
#' }
#'
#' @docType data
#' @keywords datasets
#' @format A data frame with 375 observations on the following 26 variables.
#' @name biallelic
NULL
#' Half-sib dam trait dataset
#'
#' An example dataset of a neutral trait created from simulations of 15
#' populations under an island model. Offspring per dam are unbalanced across
#' the breeding design with 6 dams per population, an average of 10 offspring per
#' dam, and 900 individuals total.
#'
#' \describe{
#' \item{pop}{a column of population identifiers}
#' \item{dam}{a column of unique dam identifiers}
#' \item{trait}{column of trait values per individual}
#' }
#'
#' @docType data
#' @keywords datasets
#' @format A data frame with 900 observations on the following 3 variables.
#' @name hsdam
NULL
#' Half-sib sire trait dataset
#'
#' An example dataset of a neutral trait created from simulations of 15 populations
#' under an island model. 900 individuals total come from an average of 4 dams
#' per sire and an average of 12 sires per population for an overall average of 60
#' individuals per population.
#'
#' \describe{
#' \item{pop}{a column of population identifiers}
#' \item{sire}{a column of unique sire identifiers}
#' \item{dam}{a column of unique dam identifiers}
#' \item{trait}{column of trait values per individual}
#' }
#'
#' @docType data
#' @keywords datasets
#' @format A data frame with 900 observations on the following 4 variables.
#' @name hssire
NULL
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.