# This file was generated by Rcpp::compileAttributes
# Generator token: 10BE3573-1514-4C36-9D1C-5A225CD40393
#' @title Inferring hidden ancestry states from haploids
#'
#' @description Uses the forward-backward algorithm to estimate ancestry
#' along a given chromosome for a given genotyped tetrad or simulated data.
#'
#' @param snp_locations a numeric vector specifying the locations of each snp (in bps). This
#' vector is assumed to be ordered (sorted from smallest to largest snp).
#'
#' @param p0 a vector specifying the number of reads that mapped to parent 0.
#' p0 is assumed to be in the smae order as snp_locations.
#'
#' @param p1 a vector specifying the number of reads that mapped to parent 1.
#' p1 is assumed to be in the smae order as snp_locations.
#'
#' @param p_assign a value specifying the assignment probabilty (see details).
#'
#' @param scale a numeric specifying the the genome-wide recombination rate (Morgans / bp). \code{scale} is assumed to be
#' between 0 and 1 but in practice it is usually quite small.
#'
#' @details The function \code{fb_haploid} attempts to estimate
#' parental genotypic 'states' along a chromosome given empirical or
#' simulated F2 cross data.
#' Next-generation data inherits both sequencing error and missing data
#' -- especially when sequencing coverage is low. Also, parental populations
#' can be polymorphic with respect to gene frequencies before admixture. In these cases
#' the parental state is ambiguous or unknown. \code{fb_haploid} takes
#' these uncertainties into account and estimates the most likely sequence of
#' ancestry.
#'
#' The three steps taken in \code{fb_haploid} are:
#' \enumerate{
#' \item Calculate forward probabilities for each state (5' to 3')
#' \item Calculate backward probabilities for each state (3' to 5')
#' \item From these two probabilities, calculate the posterior probability
#' that a snp location is of a given parental state.
#' }
#'
#' The two element vector of forward probabilities, \eqn{f_{i}}, for each parental state
#' at each position \eqn{ i } is calculated as:
#' \deqn{ f_{i} = \mathbf{e_{i}}~\mathbf{T_{i}}~\mathbf{f_{i-1}} }
#'
#' where \eqn{e_{i}} is the emission probabilities for each state (see below), \eqn{T_{i}}
#' is a 2-by-2 matrix describing the transition (recombination) probabilities
#' between two states, and \eqn{f_{i-1}} is the forward probability at the
#' previous position along the chromosome (5' of position \eqn{i}). It is assumed
#' that each state is equally likely to occur at the first position.
#'
#'
#' The emmission probabilities are calculated independently for
#' each snp and depends on the sequence reads assigned to parent "0"
#' and parent "1" and \code{p_assign}. The emission probabilities are
#' calculated using the binomial equation. For example, the emission
#' probability for parental state "0" at snp position i is:
#'
#' \deqn{e_{i}^{(0)} = {n \choose{k_{0}}} p^{k_{0}} (1-p)^{n-k_{0}}}
#'
#' where \eqn{n} is the sum of the number of reads from both parents at snp \eqn{i} and \eqn{p}
#' is the assignment probability, \code{p_assign}.
#'
#' Like the emission probabilities, the transition probabilities are also
#' calculated for each snp position. The transition probabilities in matrix \eqn{\mathbf{T_{i}}}
#' are calcuated from the genome-wide recombination rate, \code{scale}, and the physical
#' distance (in bps) between the two snps. Specifically, Haldane's function is used
#' to estimate the probabilty of getting an odd number of crossovers between snp \eqn{i}
#' and snp \eqn{i-1}. For more details, see Hether et al. (in prep).
#'
#' To avoid underflow, we rescale the forward (and backward) probabilities
#' each iteration to that they sum to unity.
#'
#' The backward probabilities are calculated similarly but in the 3' to 5'
#' direction. It is assumed that the backward probability is equally likely
#' in each state. Following Durbin et al. (1998) the backward probability at
#' snp position \eqn{i} is:
#'
#' \deqn{b_{i} = \mathbf{T_{i}}~\mathbf{e_{i+1}}~\mathbf{b_{i+1}}}
#'
#' Again, these probabilities are rescaled to avoid underflow.
#'
#' The posterior probability that the state at position \eqn{i} is \eqn{j}
#' given the observed sequence read counts, \eqn{x_{i}}
#' is calculated by:
#'
#' \deqn{P(\pi_{i} = j~|~x_{i}) = \frac{\mathbf{f}_{i}^{(j)} \mathbf{b}_{i}^{(j)}}{\mathbf{f}_{i}~\mathbf{b}_{i}}}
#'
#' Ties in posterior probabilities are rare with sufficient coverage and/or signal. However, in the event of a tie state 1 is picked.
#'
#' Finally, we can determine the log likelihood of the whole sequence
#' of observations by summing up the log of all the scale factors in the
#' forward probability calculation.
#'
#' @return a dataframe with the following columns:
#' \itemize{
#' \item \code{snp_locations}
#' \item \code{p0}
#' \item \code{p1}
#' \item The emission matrix (2 columns) specifying the emmision probability of belong
#' to the parent 0 (emiss.1) and parent 2 (emiss.2)
#' \item the forward probability matrix (2 columns) giving the scaled
#' forward probabilities of belong
#' to the parent 0 (forward.1) and parent 2 (forward.2)
#' \item the backward probability matrix (2 columns) giving the scaled
#' backward probabilities of belong
#' to the parent 0 (backward.1) and parent 2 (backward.2)
#' \item \code{Fscale} The forward scaling factor for each snp
#' \item \code{Bscale} The backward scaling factor for each snp
#' \item posterior probability matrix (2 columns) posterior
#' probabilities for belong to each parental state
#' \item states_inferred a vector of length snp_locations giving the state with the
#' highest posterior.
#' \item lnL a numeric giving the total likelihood of the data (see details).
#' }
#'
#' @references Hether, T.D., C. G. Wiench1, and P.A. Hohenlohe (in review). 2015. Novel molecular and analytical tools
#' for efficient estimation of rates of meiotic crossover, non-crossover and gene conversion
#'
#' Drubin, R. S. Eddy, A. Krogh, and G. Mitchison. 1998. Biological Sequence Analysis: Probabilistic Models
#' of proteins and nucleic acids. Cambridge University Press, Cambridge CB2 8RU, UK.
#'
#' @seealso \code{\link{fb_diploid}}
#'
#' @author Tyler D. Hether
#'
#' @examples
#' set.seed(1234567) # For reproducibility
#' n_spores <- 1 # number of spores
#' l <- 75 # number of snps to simulate
#' c <- 3.5e-05 # recombination rate between snps (Morgan/bp)
#' snps <- c(1:l)*1.3e4 # snps are evenly spaced 20kbp apart
#' p_a <- 0.95 # assignment probability
#' coverage <- 2.1 # mean coverage
#' # Now simulate
#' sim1 <- sim_en_masse(n.spores=n_spores, scale=c, snps=snps,
#' p.assign=p_a, mu.rate=0, f.cross=0.8, f.convert=0.3,
#' length.conversion=2e3, coverage=coverage)
#' fb_haploid(snp_locations=sim1$Snp, p0=sim1$p0, p1=sim1$p1, p_assign=p_a, scale=c)
#'
#' @export
fb_haploid <- function(snp_locations, p0, p1, p_assign, scale) {
.Call('HMMancestry_fb_haploid', PACKAGE = 'HMMancestry', snp_locations, p0, p1, p_assign, scale)
}
#' @title Inferring hidden ancestry states from diploids
#'
#' @description Uses the forward-backward algorithm to estimate ancestry
#' along a given chromosome for a given genotyped diploid.
#'
#' @param snp_locations a numeric vector specifying the locations of each snp (in bps). This
#' vector is assumed to be ordered (sorted from smallest to largest snp).
#'
#' @param p0 a vector specifying the number of reads that mapped to parent 0.
#' p0 is assumed to be in the smae order as snp_locations.
#'
#' @param p1 a vector specifying the number of reads that mapped to parent 1.
#' p1 is assumed to be in the smae order as snp_locations.
#'
#' @param p_assign a value specifying the assignment probabilty (see details).
#'
#' @param scale a numeric specifying the the genome-wide recombination rate (Morgans / bp). \code{scale} is assumed to be
#' between 0 and 1 but in practice it is usually quite small.
#'
#' @details This is an extension of \code{\link{fb_haploid}} with three hidden states:
#' two homozygous states and a heterozygous state.
#'
#' @return a dataframe with the following columns:
#' \itemize{
#' \item \code{snp_locations}
#' \item \code{p0}
#' \item \code{p1}
#' \item The emission matrix (3 columns) specifying the emmision probability of belong
#' to the only parent 0 (emiss.1), both parents (emiss.1), and only parent 2 (emiss.3)
#' \item the forward probability matrix (3 columns) giving the scaled
#' forward probabilities of belong
#' to the only parent 0 (forward.1), both parents (forward.2), and only parent 2 (forward.3)
#' \item the backward probability matrix (3 columns) giving the scaled
#' backward probabilities of belong
#' to the only parent 0 (backward.1), both parents (backward.2), and only parent 2 (backward.3)
#' \item \code{Fscale} The forward scaling factor for each snp
#' \item \code{Bscale} The backward scaling factor for each snp
#' \item posterior probability matrix (3 columns) posterior
#' probabilities for belong to each parental state
#' \item states_inferred a vector of length snp_locations giving the state with the
#' highest posterior.
#' \item lnL a numeric giving the total likelihood of the data (see \code{\link{fb_haploid}} ).
#' }
#'
#' @references Hether, T.D., C. G. Wiench1, and P.A. Hohenlohe (in review). 2015. Novel molecular and analytical tools
#' for efficient estimation of rates of meiotic crossover, non-crossover and gene conversion
#'
#' Drubin, R. S. Eddy, A. Krogh, and G. Mitchison. 1998. Biological Sequence Analysis: Probabilistic Models
#' of proteins and nucleic acids. Cambridge University Press, Cambridge CB2 8RU, UK.
#'
#' @seealso \code{\link{fb_haploid}}
#'
#' @author Tyler D. Hether
#'
#' @examples
#' set.seed(1234567) # For reproducibility
#' n_spores <- 1 # number of spores
#' l <- 75 # number of snps to simulate
#' c <- 3.5e-05 # recombination rate between snps (Morgan/bp)
#' snps <- c(1:l)*1.3e4 # snps are evenly spaced 20kbp apart
#' p_a <- 0.95 # assignment probability
#' coverage <- 2.1 # mean coverage
#' # Now simulate two haploids
#' sim1 <- sim_en_masse(n.spores=n_spores, scale=c, snps=snps,
#' p.assign=p_a, mu.rate=0, f.cross=0.8, f.convert=0.3,
#' length.conversion=2e3, coverage=coverage)
#' sim2 <- sim_en_masse(n.spores=n_spores, scale=c, snps=snps,
#' p.assign=p_a, mu.rate=0, f.cross=0.8, f.convert=0.3,
#' length.conversion=2e3, coverage=coverage)
#' # Now merge the two haploids to make a diploid
#' p0 <- sim1$p0+sim2$p0
#' p1 <- sim1$p1+sim2$p1
#' res <- fb_diploid(snp_locations=sim1$Snp, p0=p0, p1=p1, p_assign=p_a, scale=c)
#' res
#'@export
fb_diploid <- function(snp_locations, p0, p1, p_assign, scale) {
.Call('HMMancestry_fb_diploid', PACKAGE = 'HMMancestry', snp_locations, p0, p1, p_assign, scale)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.