R/topmatch.R In hiphop: Parentage Assignment using Bi-Allelic Genetic Markers

Documented in topmatch

#' A function selecting the dams and sires with the fewest genetic mismatches to an offspring
#'
#' This function summarizes per offspring the top combinations of dam and sire
#' that have the least genetic mismatches according to the hot (Huisman 2017) and/or hiphop (Cockburn et al. in revision) test criteria.
#' In addition to the top matched combinations the summary always also list the social parent, if not among the top X.
#' The user can choose whether one wants to look for the most likely dam and sire with and without
#' assuming that the social mother (or father) is the genetic parent. Furthermore, one can choose on which
#' test score to rank individuals. For more information and worked examples, see the vignette and Cockburn et al. (in revision).
#' @param x The file with hot and hiphop scores that is generated by the 'hothiphop()' function.
#' @param ranking This sets the mismatch criterion in which dams and sires are ranked,
#' possibilities include ranking="hothiphop.parents", "hiphop", "hot.parents", hot.dam", "hot.sire", "hothiphop.dam", "hothiphop.sire".
#' In some situations it can be useful to supply two ranking criteria, for example to avoid ties (e.g. ranking=c("hot.dam", "hiphop))
#' @param unique When using the ranking criteria hot.dam or hot.sire the best ranked dam or sire will occupy the entire top 3 and
#' this does not allow comparing among dams. By setting unique to "dam" (or "sire") the output shows the
#' top 3 with the best record for each unique dam (or sire).
#' @param condition Whether or not one wants to condition on either the social mother (condition="mother")
#' or social father (condition="father"). The default is condition="none" in which case all
#' possible dams and sires are considered as genetic parents.
#' @param thres Sets a threshold value for the number of mismatches, if parents have less mismatches
#' then the variable below.threshold in the output will have a value of 1.
#' If 0>=thres<1, the number of mismatches will be re-expressed as a proportion of all loci sampled (not NA).
#' If -1>thres<0  the mismatch scores are expressed as the number of mismatched divided by the number of loci at which the offspring is heterozygote (for the HIPHOP test) or divided by  number of loci at which the offspring is homozygote (for the HOT test). This standardization may be sometimes useful for the recalculating the HIPHOP score to account for the fact that some offspring can be scored at more loci than others for this test.
#' By default thres is set to a large integer (99999).
#' @param top Sets the number of top matches that is shown in the output
#' #' @return a dataframe with for each offspring the top X (default top 3, giving 3 rows for each offspring) dam and sire combinations
#' and their mismatch scores according to the HOT and HIPHOP test, the number of loci this was based on,
#' and some additional relevant information about the social parents and potential dam and sires.
#' In addition to the top X offspring-dam-sire with the fewest mismatches, the scores of the social parents are also always listed
#' (if they were not already in the top X ranking).
#'#' \describe{
#'   \item{year}{ the year or cohort that is being considered, adults can be potential dam or sire in some years, but no in others}
#'   \item{brood}{ an identifier of the brood to which the offspring and adults belong/are associated with}
#'   \item{offspring}{ an identifier of the offspring }
#'   \item{rank}{ the ranking among all possible combinations of offspring and potential dams and sires, rank 1 is fewest mismatches according to criterion chosen in the function argument 'ranking' }
#'   \item{dam}{ an identifier of the potential dam }
#'   \item{sire}{ an identifier of the potential sire }
#'   \item{dam.type}{ description of the type of potential dam (social parent, extra-group parent, within-group subordinate) }
#'   \item{sire.type}{ description of the type of potential sire (social parent, extra-group parent, within-group subordinate) }
#'   \item{hothiphop.parents}{ the sum of the hiphop and hot.parents mismatch score }
#'   \item{hiphop}{ the hiphop mismatch score of the offspring with the potential dam and potential sire, expressed as the number of loci giving mismatches }
#'   \item{hot.parents}{ the hot score of the offspring with both the potential dam and sire, expressed as the number of loci giving mismatches}
#'   \item{hot.dam}{ the hot score of the offspring with the potential dam, expressed as the number of loci giving mismatches}
#'   \item{hot.sire}{ the hot mismatch score of the offspring with the potential sire , expressed as the number of loci giving mismatches}
#'   \item{hothiphop.dam}{ the sum of the hot.dam and hiphop mismatch score }
#'   \item{hothiphop.sire}{ the sum of the hot.sire and hiphop mismatch score }
#'   \item{below.threshold}{ if the score chosen in argument 'ranking' is below the chose threshold value, then equals to 1, else 0 }
#'   \item{threshold}{ the chosen threshold value in argument 'threshold'}
#'   \item{loci.dyad.dam}{ the number of loci at which both the offspring and dam were not NA }
#'   \item{loci.dyad.sire}{the number of loci at which both the offspring and sire were not NA }
#'   \item{loci.triad}{ the number of loci at which the offspring, dam and sire were not NA }
#'   \item{offspring.heterozygosity}{ the proportion of loci at which the offspring was heterozygous }
#'   \item{social.mother.sampled}{if the social.mother genotypic data is in the genotypes file then equal to 1, else 0}
#'   \item{social.father.sampled}{if the social.father genotypic data is in the genotypes file then equal to 1, else 0}
#'   \item{social.mother}{ identity of the social mother of the offspring}
#'   \item{social.father}{ identity of the social father of the offspring}
#'   \item{condition}{ the value of the argument 'condition'}
#'   \item{ranking}{ the value of the (first) value in the argument 'ranking'}
#'   \item{ranking2}{ the value of the second value in the argument 'ranking'}
#' }
#' @author Martijn van de Pol, \email{martijn@myscience.eu}
#' @references Cockburn et al. (2020) HIPHOP: improved paternity assignment among close relatives using a simple exclusion method for biallelic markers.
#' Molecular Ecology Resources, in revision.
#' @references Huisman, J. (2017). Pedigree reconstruction from SNP data: parentage assignment, sibship clustering and beyond. Molecular ecology resources, 17(5), 1009-1024.
#' @source Cockburn et al. (2020) HIPHOP: improved paternity assignment among
#' close relatives using a simple exclusion method for biallelic markers.
#' Molecular Ecology Resources, in revision.
#' @examples
#' results<-hothiphop(ind=individuals[1:22,], gen=genotypes)
#' best<-topmatch(x=results, ranking="hothiphop.parents")
#' @export
topmatch<-function(x, ranking, condition="none",  thres=99999, top=3, unique="pair") {
if(length(ranking)==1) { ranking[2]<-"none" }
#some checks on correct values for the arguments
if(length(ranking)>2) {stop("the ranking parameter can only contain 2 criteria")}
condition.types<-c("none", "mother", "father")
ranking.types<-c("hothiphop.parents", "hot.parents", "hot.dam", "hot.sire", "hiphop", "hothiphop.sire", "hothiphop.dam")
unique.types<-c("pair", "dam", "sire")
if(length(which(unique.types==unique))==0) {stop("the unique argument can only have the values", unique.types)}
if(length(which(condition.types==condition))==0) {stop("the condition argument can only have the values", condition.types)}
if(length(which(ranking.types==ranking[1]))==0) {stop("the ranking argument can only have the values: ", ranking.types) }
if(length(which(c(ranking.types,"none")==ranking[2]))==0) {stop("the ranking argument can only have the values: ", ranking.types, "none") }
if(thres<=-1) {stop("thres must be a value larger than -1, see the decription in the help file on parameter thres")}
if(top<1 || top%%1!=0) {stop("top must be a postive integer value")}
if(condition=="mother" & ranking[1]=="hot.dam" ) {stop("it does not make sense to condition on mother and look at hot.dam scores")}
if(condition=="father" & ranking[1]=="hot.sire" ) {stop("it does not make sense to condition on one father and look at hot.sire scores")}
if(condition=="mother" & unique=="dam" ) {stop("it does not make sense to condition on one father and look at unique dams")}
if(condition=="father" & unique=="sire" ) {stop("it does not make sense to condition on one father and look at unique sires")}
# subset the model output for cases where not all possible combinations are relevant
if(condition=="mother") {x<-subset(x,x$is.dam.social==1)} if(condition=="father") {x<-subset(x,x$is.sire.social==1)}
# warnings
warning.value<-0.05
if(((max(x$loci.dyad.dam)-min(x$loci.dyad.dam))/mean(x$loci.dyad.dam))>warning.value) { warning("offspring-dam combinations differ by more than ", warning.value*100, "% in the number of loci available (not NA) for the HOT.DAM test, consider using proportion instead of number of mismatches by setting the thres argument to a value lower than 1 (e.g. thres=0.99)") } if(((max(x$loci.dyad.sire)-min(x$loci.dyad.sire))/mean(x$loci.dyad.sire))>warning.value) {
warning("offspring-sire combinations by differ more than ", warning.value*100, "% in the number of loci available (not NA) for the HOT.SIRE test, consider using proportion instead of number of mismatches by setting the thres argument to a value lower than 1 (e.g. thres=0.99)")
}
if(((max(x$loci.triad)-min(x$loci.triad))/mean(x$loci.triad))>warning.value) { warning("offspring-dam-sire combinations differ by more than ", warning.value*100, "% in the number of loci available (not NA) for the HIPHOP and HOT.parents test, consider using proportion instead of number of mismatches by setting the thres argument to a value lower than 1 (e.g. thres=0.99)") } if(thres>=0 & thres<1) { # recalculate all scores into proportions x$hot.dam<-x$hot.dam/x$loci.dyad.dam
x$hot.sire<-x$hot.sire/x$loci.dyad.sire x$hot.parents<-x$hot.parents/x$loci.triad
x$hiphop<-x$hiphop/x$loci.triad x$hothiphop.parents<-x$hiphop+x$hot.parents
x$hothiphop.dam<-x$hiphop+x$hot.dam x$hothiphop.sire<-x$hiphop+x$hot.sire
}
if(thres<0 & thres>=-1) {
x$hot.dam<-x$hot.dam/(x$loci.dyad.dam*(1-x$offspring.heterozygosity))
x$hot.sire<-x$hot.sire/(x$loci.dyad.sire*(1-x$offspring.heterozygosity))
x$hot.parents<-x$hot.parents/(x$loci.triad*(1-x$offspring.heterozygosity))
x$hiphop<-x$hiphop/(x$loci.triad*x$offspring.heterozygosity)
x$hothiphop.parents<-x$hot.parents + x$hiphop x$hothiphop.dam<-x$hiphop+x$hot.dam
x$hothiphop.sire<-x$hiphop+x$hot.sire } x$ranking<-x[,ranking[1]]
ifelse(ranking[2]=="none", x$ranking2<-x$ranking*NA, x$ranking2<-x[,ranking[2]]) x<-x[order(x$ranking, x$ranking2),] x$rank.all<-seq(1,length(x$ranking),1) offspring<-unique(x$offspring)
triad.best$condition<-condition triad.best$threshold<-thres
if(thres<0) {  triad.best$threshold<-paste0("(-)",abs(thres))} triad.best$ranking<-ranking[1]
triad.best$ranking2<-ranking[2] thres<-abs(thres) top.pairs<-list(assign( paste0("triad.best.top","empty"), triad.best )) for(i in 1:top) { top.pairs[[paste0("triad.best.top",i)]]<-triad.best } for(o in 1:length(offspring)) { if(o%%100==0) { print(paste0(round((100*o/length(offspring)),0),"%, ", o," offspring finished")) } triad.best[o, c("dam.type", "sire.type")]<-"social parent" triad.best[o, c("year","brood", "offspring", "social.mother.sampled", "social.father.sampled", "offspring.heterozygosity")]<-x[which(x$offspring==offspring[o])[1], c("year","brood", "offspring", "social.mother.sampled", "social.father.sampled", "offspring.heterozygosity")]
selection.social.pair<-which(x$offspring==offspring[o] & x$is.dam.social==1  & x$is.sire.social==1) if(length(selection.social.pair)>0) { triad.best[o, c("year","brood", "offspring", "dam", "sire","hothiphop.parents","hiphop", "hot.parents", "hot.dam","hot.sire", "hothiphop.dam","hothiphop.sire", "loci.dyad.dam", "loci.dyad.sire", "loci.triad", "social.mother.sampled", "social.father.sampled", "social.mother", "social.father", "rank2")]<-x[selection.social.pair, c("year","brood", "offspring" ,"potential.dam", "potential.sire", "hothiphop.parents","hiphop", "hot.parents", "hot.dam","hot.sire", "hothiphop.dam","hothiphop.sire", "loci.dyad.dam", "loci.dyad.sire", "loci.triad", "social.mother.sampled", "social.father.sampled", "potential.dam", "potential.sire", "ranking2")] triad.best$rank[o]<-length(which(x[which(x$offspring==offspring[o]),"ranking"]<x$ranking[selection.social.pair]))+1
triad.best$below.threshold[o]<-ifelse(x$ranking[selection.social.pair]<thres,1,0)
} else {
selection.social.mother<-which(x$offspring==offspring[o] & x$is.dam.social==1)
selection.social.father<-which(x$offspring==offspring[o] & x$is.sire.social==1)
if(length(selection.social.mother)>0) {
ifelse(ranking[1]=="hot.dam",triad.best$rank[o]<-length(which(x[which(x$offspring==offspring[o]),"ranking"]<x$ranking[selection.social.mother[1]]))+1, triad.best$rank[o]<-NA)
triad.best$below.threshold[o]<-ifelse(x$ranking[selection.social.mother[1]]<thres,1,0)
}
if(length(selection.social.father)>0) {
ifelse(ranking[1]=="hot.sire",triad.best$rank[o]<-length(which(x[which(x$offspring==offspring[o]),"ranking"]<x$ranking[selection.social.father[1]]))+1, triad.best$rank[o]<-NA)
triad.best$below.threshold[o]<-ifelse(x$ranking[selection.social.father[1]]<thres,1,0)
}
}
x$socials<-x$is.dam.social*x$is.sire.social*-1 # note the *-1, such that social parents have lower score (-1) than other triads (0) list<-which(x$offspring==offspring[o])
best.matches<-list[order(x[list,"ranking"],x[list,"socials"])[1:top]] # sort on ranking and in case of ties, put social parents on top.
if(unique!="pair") {
xx<-x[list[order(x[list,"rank.all"],x[list,"socials"])],]
xx<-xx[!duplicated(xx[paste0("potential.",unique)]),]
xx.list<-as.numeric(rownames(xx[1:top,]))
for(i in 1:top) { best.matches[i]<-which(rownames(x)==xx.list[i]) }
}
for(i in 1:top) {
top.pairs[[paste0("triad.best.top",i)]]$social.mother.sampled[o]<-triad.best$social.mother.sampled[o]
top.pairs[[paste0("triad.best.top",i)]]$social.father.sampled[o]<-triad.best$social.father.sampled[o]
top.pairs[[paste0("triad.best.top",i)]]$rank[o]<-i top.pairs[[paste0("triad.best.top",i)]][o, c("year","brood", "offspring", "dam", "sire", "hothiphop.parents","hiphop", "hot.parents", "hot.dam","hot.sire", "hothiphop.dam","hothiphop.sire", "loci.dyad.dam", "loci.dyad.sire", "loci.triad", "offspring.heterozygosity", "social.mother.sampled", "social.father.sampled", "rank2")]<-x[best.matches[i], c("year","brood", "offspring" ,"potential.dam", "potential.sire","hothiphop.parents","hiphop", "hot.parents", "hot.dam","hot.sire", "hothiphop.dam","hothiphop.sire", "loci.dyad.dam", "loci.dyad.sire", "loci.triad", "offspring.heterozygosity", "social.mother.sampled", "social.father.sampled", "ranking2")] top.pairs[[paste0("triad.best.top",i)]][o, c("social.mother","social.father")]<-triad.best[o, c("social.mother","social.father")] top.pairs[[paste0("triad.best.top",i)]]$below.threshold[o]<-ifelse(x$ranking[best.matches[i]]<thres, 1, 0) if(x$is.dam.social[best.matches[i]]==1) {
} else {
ifelse(x$is.dam.within.group[best.matches[i]]==1, top.pairs[[paste0("triad.best.top",i)]][o,"dam.type"]<-"within-group subordinate", top.pairs[[paste0("triad.best.top",i)]][o,"dam.type"]<-"extra-group parent") } if(x$is.sire.social[best.matches[i]]==1) {
ifelse(x$is.sire.within.group[best.matches[i]]==1,top.pairs[[paste0("triad.best.top",i)]][o,"sire.type"]<-"within-group subordinate", top.pairs[[paste0("triad.best.top",i)]][o,"sire.type"]<-"extra-group parent") } } } for(j in 1:top) { triad.best<-rbind(triad.best, top.pairs[[paste0("triad.best.top",j)]]) } triad.best <-triad.best[order(triad.best$brood, triad.best$offspring, triad.best$rank, triad.best$rank2),] triad.best <- unique( triad.best[ , 1:dim(triad.best)[2] ] ) triad.best$rank2<-NULL