Nothing
#' Forward-Backward algorithm for local admixture inference
#'
#' @param Geno
#' List of genotying matrices of size `M` (number of markers),
#' with each matrix of dimension `N`(number of individuals) x `L`(number of alleles)
#' whose elements consists of discrete or continuous dosages
#' @param ResAdmixGlobal
#' A list generated by the \code{\link{AdmixGlobal}} function
#' @param Ind
#' Name of the individual to be processed
#' @param P
#' Ploidy level of the individual (positive integer superior or equal to 2)
#' @param GeneticMap
#' Dataframe with three columns: `Chromosome`, `Marker`, and `Distance`,
#' with `M` rows. Distances must be in centiMorgan
#' @param SmoothParam
#' Smoothing parameter corresponding to the number of generations since admixture
#' @param MaxRec
#' Maximum number of recombination between adjacent markers,
#' used as an approximation for transition probabilities,
#' in order to speed up calculations and limit memory usage.
#' The approximation is disabled by default.
#' Specify a positive integer between 1 and `P` to activate it,
#' which overrides `DistIntBounds`.
#' @param DistIntBounds
#' Vector of bounds for genetic distance intervals,
#' used as an approximation for transition probabilities,
#' in order to speed up calculations and limit memory usage.
#' To deactivate the approximation, set `DistIntBounds=NULL`.
#' @param MinProp
#' Minimum value for admixture proportion below which the contribution of the
#' group is not considered for the individual
#' (positive numeric value inferior to 1/K where K is the number of groups)
#' @param EmisProbMaxPermut
#' Maximum number of permutations involved in computing exact emission probability,
#' otherwise an approximate emission probability is calculated,
#' used as an approximation to speed up calculations.
#' When continuous dosages are informed, the approximate emission probability
#' is calculated.
#' @param NbThreads
#' Number of threads to be used (positive integer) with a default value of 0
#' setting automatically all threads available
#' @param Verbose
#' A boolean describing if detailed information should be printed
#'
#' @return
#' A list of four items: a list of size `C` (number of chromosomes) with ancestry
#' dosage matrices obtained from the posterior distribution (`Posterior`),
#' a list of size `C` with ancestry dosages obtained from the Viterbi
#' algorithm (`Viterbi`), the log-likelihood value (`LogLik`),
#' and a list of size `C` of vectors indicating what
#' emission probability was computed at each marker (`EmisProbType`)
#'
#' @description
#' This function performs local admixture inference for a selected individual
#'
#' @details
#' The function `AdmixLocal()` performs local admixture inference from:
#' (i) genotyping data (`Geno`) formatted as a list of matrices
#' (one for each marker),
#' (ii) a list (`ResAdmixGlobal`) generated by the global admixture function
#' [AdmixGlobal()],
#' (iii) a genetic map dataframe (`GeneticMap`),
#' (iV) the name of the individual to analyse (`Ind`),
#' (v) the ploidy level of the individual (`P`).
#'
#' The inference of the local admixture hidden Markov model
#' is performed by calculating posterior probabilities and the
#' Viterbi path, both available from the results.
#' By default, all available threads/CPU cores are used but the number
#' can be chosen using `NbThreads`.
#'
#' The rate at which breakpoints between blocks of homogeneous
#' ancestry occur per unit of genetic distance is defined as a smoothing
#' parameter controlling the size of ancestry blocks (`SmoothParam`). The larger
#' the value, the smaller the expected ancestry block size.
#'
#' Two approximations exist for transition probabilities to speedup calculations:
#' (i) the first consists of a simplification of genetic distances between
#' adjacent marker where genetic distances (in cM) comprised within each of
#' these intervals are averaged (`DistIntBounds`),
#' (ii) the second consists of limiting the number of recombination authorized
#' to move from one ancestry state to another between adjacent markers (`MaxRec`).
#' When both arguments are specified as `NULL`, exact transition probabilities
#' are calculated.
#'
#' When allele dosages are discrete, the exact emission probability distribution
#' function is calculated when the number of phased genotypes (i.e. permutations)
#' that can be derived from the unphased allele dosage does not exceed a
#' threshold defined by `EmisProbMaxPermut`.
#' When number of permutations exceeds the threshold or when allele dosages are
#' continuous (e.g. obtained from read depth ratios), an approximation of the
#' emission probability distribution is calculated, which helps
#' speeding up calculations.
#'
#' When an individual has a very low global (genome-wide) admixture proportion
#' for an ancestral group, this contribution can be considered negligible and
#' discarded from the local admixture inference to speedup calculations.
#' The minimum threshold can be set using `MinProp`.
#'
#' @importFrom magrittr %>%
#' @importFrom magrittr set_names
#' @importFrom purrr map_dfr
#' @importFrom utils tail
#' @importFrom Matrix sparseMatrix
#'
#' @seealso
#' * [SimulatePop()] to simulate a polyploid admixed population.
#' * [AdmixGlobal()] to perform global (genome-wide) admixture inference and
#' generate the `ResAdmixGlobal` object for the `AdmixLocal()` function.
#' * [LocalPlot()] to generate a local admixture plot using the results from
#' the `AdmixLocal()` function.
#'
#' @export
#'
#' @examples
#' ## Simulate Simulate a polyploid admixed population
#' DataSim <- SimulatePop(K=3L, N=10L, P=6L, M=50L, C=5L, L=10L, Seed=123, NbThreads=1)
#'
#' ## Perform global admixture inference
#' ResAdmixGlobal <- AdmixGlobal(Geno=DataSim$Geno, K=3, Verbose=FALSE, NbThreads=1)
#'
#' ## Perform local admixture inference for one individual
#' ResAdmixLocal <- AdmixLocal(Geno=DataSim$Geno, ResAdmixGlobal=ResAdmixGlobal,
#' Ind="Ind4", P=6L, DataSim$GeneticMap,
#' Verbose=FALSE, NbThreads=1)
#'
#' ## Posterior ancestry dosages for Chr1
#' head(ResAdmixLocal$Posterior$Chr1)
#'
#' ## Viterbi ancestry dosages for Chr1
#' head(ResAdmixLocal$Viterbi$Chr1)
#'
#' ## Log-likelihood
#' ResAdmixLocal$LogLik
#'
#' ## Type of emission probability for Chr1
#' head(ResAdmixLocal$EmisProbType$Chr1)
AdmixLocal <- function(Geno, ResAdmixGlobal, Ind, P, GeneticMap,SmoothParam=1,
MaxRec=NULL, DistIntBounds=c(0.001,0.005,0.01,0.05,0.1,0.5,1),
MinProp=1e-6, EmisProbMaxPermut=100L, NbThreads=0L,
Verbose=TRUE){
## Initial checks
stopifnot("Geno must be a list of numeric matrices" =
is.list(Geno) && is.matrix(Geno[[1]]) && is.numeric(Geno[[1]]))
stopifnot("Geno must be a named list (marker names)" =
!is.null(names(Geno)))
stopifnot("Geno matrices must have column names (allele names at markers)" =
!is.null(colnames(Geno[[1]])))
stopifnot("Geno matrices must have row names (individual names)" =
!is.null(rownames(Geno[[1]])))
stopifnot("ResAdmixGlobal must be generated by the AdmixGlobal function" =
inherits(ResAdmixGlobal, "AdmixGlobal"))
stopifnot("ResAdmixGlobal must have been generated from Geno" =
all(names(ResAdmixGlobal$Freq)==names(Geno)) &&
all(rownames(ResAdmixGlobal$Prop)==rownames(Geno[[1]])))
stopifnot("Ind must be the name of a single individual included in the dataset" =
Ind%in%rownames(ResAdmixGlobal$Prop) && length(Ind)==1)
stopifnot("P must be a positive integer" =
P>=1L)
stopifnot("GeneticMap must be a dataframe with three named columns
(Marker, Chromosome and Distance)" =
is.data.frame(GeneticMap) &&
all(c("Marker","Chromosome","Distance")%in%colnames(GeneticMap)))
stopifnot("Geno list names and Marker column of GeneticMap must have identical elements" =
all(names(Geno)%in%GeneticMap$Marker))
stopifnot("SmoothParam must be a positive numeric value" =
SmoothParam>0)
stopifnot("MaxRec must range between 1 and P, or be deactivated by specifying NULL" =
is.null(MaxRec) || MaxRec%in%c(1:P))
stopifnot("DistIntBounds must be a vector of positive numeric values,
or be deactivated by specifying NULL" =
is.null(DistIntBounds) || (is.numeric(DistIntBounds) && is.vector(DistIntBounds)))
stopifnot("NbThreads should be a non-negative integer" =
NbThreads>=0)
stopifnot("Verbose should be a boolean" =
is.logical(Verbose))
## Define global variable for tidy operations
. <- NULL
Chromosome <- NULL
Marker_1 <- NULL
Marker_2 <- NULL
Dist <- NULL
## Get quantities
Prop_i <- ResAdmixGlobal$Prop[Ind,]
Freq <- ResAdmixGlobal$Freq
Marker_names <- names(Geno) %>% set_names(.,.)
GeneticMap <- GeneticMap[match(Marker_names,GeneticMap$Marker),]
Chromosome_names <- unique(GeneticMap$Chromosome) %>% set_names(.,.)
K <- length(Prop_i)
M <- length(Geno)
C <- length(Chromosome_names)
Lm <- sapply(Geno, ncol)
## Additional checks
stopifnot("MinProp cannot be negative" =
MinProp>=0)
stopifnot("MinProp is too high relative to number of groups" =
MinProp<1/K)
## Print information
if(Verbose){
cat("#####################################################\n",
"#####################################################\n",
"#### AdmixPoly ####\n",
"#### Inference of local admixture proportions ####\n",
"#####################################################\n",
"#####################################################\n\n",sep="")
cat("Name of the individual = ",Ind,"\n",
"Ploidy level of the individual = ",P,"\n",
"Number of markers = ",M,"\n",
"Number of chromosomes = ",C,"\n",
"Number of alleles = ",ifelse(length(table(Lm))==1,Lm,
paste0("[",min(Lm),",",max(Lm),"]")),"\n",
"Number of groups = ",K,"\n\n",sep = "")
cat("Options:\n",
" - SmoothParam = ",SmoothParam,"\n",
" - MinProp = ",MinProp,"\n",
" - EmisProbMaxPermut = ",EmisProbMaxPermut,"\n",
sep = "")
if(is.null(DistIntBounds)){
cat(" - DistIntBounds (cM) = NULL\n",sep = "")
}else{
cat(" - DistIntBounds (cM) =",DistIntBounds,"\n")
}
if(is.null(MaxRec)){
cat(" - MaxRec = NULL\n",sep = "")
}else{
cat(" - MaxRec =",MaxRec,"\n")
}
}
## Identify active groups
K_active <- which(Prop_i>MinProp)
## Test if more than one active group
if(length(K_active)==1){
if(Verbose) cat(Ind,"is not admixed but considered as 100% from",names(K_active),"\n")
Ancestry <- lapply(Chromosome_names,function(chr){
GeneticMap_c <- GeneticMap %>% filter(Chromosome==chr)
res_c <- matrix(0,nrow = nrow(GeneticMap_c),ncol = K,
dimnames = list(GeneticMap_c$Marker,colnames(ResAdmixGlobal$Prop)))
res_c[,K_active] <- P
return(res_c)
})
res <- list(Posterior=Ancestry, Viterbi=Ancestry)
return(res)
}else{
## Format data
Prop_i <- Prop_i[K_active]
Prop_i <- Prop_i/sum(Prop_i)
K <- length(Prop_i)
ResFormat <- .FormatLocal(Geno = Geno, Freq = Freq, GeneticMap = GeneticMap,
Ind = Ind, K_active = K_active, P = P, NbThreads = NbThreads)
Geno <- ResFormat$Geno
Freq <- ResFormat$Freq
GeneticMap <- ResFormat$GeneticMap
Chromosome_names <- unique(GeneticMap$Chromosome) %>% set_names(.,.)
## Calculate distances between marker pairs
PairwiseDist <- lapply(Chromosome_names,function(chr){
GeneticMap_chr <- GeneticMap %>% filter(Chromosome==chr)
res <- tibble(Marker_1 = GeneticMap_chr$Marker[-nrow(GeneticMap_chr)],
Marker_2 = GeneticMap_chr$Marker[-1],
Chromosome = chr,
Dist = GeneticMap_chr$Distance[-1] -
GeneticMap_chr$Distance[-nrow(GeneticMap_chr)])
return(res)
})
## Function to simplify genetic distances
SimplifyDist = function(Dist,Interval){
MeanDist <- mean(Dist[Dist>=Interval[1]&Dist<Interval[2]])
res <- sapply(Dist,function(d)ifelse(d>=Interval[1]&d<Interval[2],MeanDist,d))
return(res)
}
## Simplify genetic distances between consecutive markers
DistMin <- sapply(PairwiseDist,function(chr)min(chr$Dist))
DistMax <- sapply(PairwiseDist,function(chr)max(chr$Dist))
if(!is.null(DistIntBounds)){
## Checks
if(min(DistIntBounds)>min(DistMin)){
DistIntBounds <- c(0,DistIntBounds)
}
if(max(DistIntBounds)<max(DistMax)){
DistIntBounds <- c(DistIntBounds,Inf)
}
## List of intervals
DistIntervals <- lapply(1:(length(DistIntBounds)-1),function(i)c(DistIntBounds[i],DistIntBounds[i+1]))
## Calculate mean distances per interval
PairwiseDistAll <- map_dfr(PairwiseDist, ~.x) %>%
{ mutate(., Dist = Reduce(function(d, i)
SimplifyDist(Dist = d, Interval = i), x = DistIntervals, init = .$Dist))} %>%
# PairwiseDistAll <- map_dfr(PairwiseDist,~.x) %>%
# mutate(Dist = Reduce(function(d,i)
# SimplifyDist(Dist = d,Interval = i),x = append(list(Dist),DistIntervals))) %>%
mutate(Index = as.numeric(as.factor(Dist)))
PairwiseDist <- lapply(Chromosome_names,function(chr) PairwiseDistAll %>% filter(Chromosome==chr))
## Print information
if(Verbose){
cat("\nAverage genetic distances are considered between consecutive markers:","\n")
for(a in 1:max(PairwiseDistAll$Index)){
cat(" - Interval (cM) = [",DistIntervals[[a]][1],";",DistIntervals[[a]][2],
"[ - Number of marker pairs = ",length(which(PairwiseDistAll$Index==a)),
" - Average distance = ",round(mean(PairwiseDistAll$Dist[PairwiseDistAll$Index==a]),3),
" cM","\n",sep = "")
}
}
}else{
if(Verbose) cat("\nExact genetic distances are considered for the HMM\n")
PairwiseDistAll <- map_dfr(PairwiseDist,~.x) %>%
mutate(Index = as.numeric(1:length(Dist)))
PairwiseDist <- lapply(Chromosome_names,function(chr) PairwiseDistAll %>% filter(Chromosome==chr))
}
## Get unique set of distances
DistVal <- PairwiseDistAll$Dist[match(1:max(PairwiseDistAll$Index),PairwiseDistAll$Index)]
## Local ancestry inference
if(Verbose) cat("\nInferring local admixture\n")
if(!is.null(MaxRec)){
res <- .ForwardBackward(Geno = Geno, Prop = Prop_i, P = P, Freq = Freq,
PairwiseDist = PairwiseDist, DistVal = DistVal,
SmoothParam = SmoothParam,
MaxRec = MaxRec, #DistInt = FALSE,
EmisProbMaxPermut = EmisProbMaxPermut,
NbThreads = NbThreads)
# }else if(!is.null(DistIntBounds)){
# res <- .ForwardBackward(Geno = Geno, Prop = Prop_i, P = P, Freq = Freq,
# PairwiseDist = PairwiseDist, DistVal = DistVal,
# SmoothParam = SmoothParam,
# MaxRec = -1, #DistInt = TRUE,
# EmisProbMaxPermut = EmisProbMaxPermut,
# NbThreads=NbThreads)
}else{
res <- .ForwardBackward(Geno = Geno, Prop = Prop_i, P = P, Freq = Freq,
PairwiseDist = PairwiseDist, DistVal = DistVal,
SmoothParam = SmoothParam,
MaxRec = -1, #DistInt = FALSE,
EmisProbMaxPermut = EmisProbMaxPermut,
NbThreads = NbThreads)
}
## Formatting of results
if(length(K_active)<ncol(ResAdmixGlobal$Prop)){
res$Posterior <- lapply(Chromosome_names,function(chr){
res_c <- matrix(0,nrow = nrow(res$Posterior[[chr]]),ncol = ncol(ResAdmixGlobal$Prop),
dimnames = list(rownames(res$Posterior[[chr]]),colnames(ResAdmixGlobal$Prop)))
res_c[,colnames(res$Posterior[[chr]])] <- res$Posterior[[chr]]
return(res_c)
})
res$Viterbi <- lapply(Chromosome_names,function(chr){
res_c <- matrix(0,nrow = nrow(res$Viterbi[[chr]]),ncol = ncol(ResAdmixGlobal$Prop),
dimnames = list(rownames(res$Viterbi[[chr]]),colnames(ResAdmixGlobal$Prop)))
res_c[,colnames(res$Viterbi[[chr]])] <- res$Viterbi[[chr]]
return(res_c)
})
}
return(res)
}
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.