R/MIM.points.R

Defines functions MIM.points

Documented in MIM.points

#' QTL Short Distance Correction by MIM
#'
#' Expectation-maximization algorithm for QTL multiple interval mapping to
#' find the best QTL position near the designated QTL position. It can handle
#' genotype data which is selective genotyping too.
#'
#' @param QTL matrix. A q*2 matrix contains the QTL information, where the
#' row dimension 'q' represents the number of QTLs in the chromosomes. The
#' first column labels the chromosomes where the QTLs are located, and the
#' second column labels the positions of QTLs (in morgan (M) or centimorgan
#' (cM)).
#' @param marker matrix. A k*2 matrix contains the marker information,
#' where the row dimension 'k' represents the number of markers in the
#' chromosomes. The first column labels the chromosomes where the markers
#' are located, and the second column labels the positions of markers (in
#' morgan (M) or centimorgan (cM)). It's important to note that chromosomes
#' and positions must be sorted in order.
#' @param geno matrix. A n*k matrix contains the genotypes of k markers
#' for n individuals. The marker genotypes of P1 homozygote (MM),
#' heterozygote (Mm), and P2 homozygote (mm) are coded as 2, 1, and 0,
#' respectively, with NA indicating missing values.
#' @param y vector. A vector that contains the phenotype values of
#' individuals with genotypes.
#' @param yu vector. A vector that contains the phenotype values of
#' individuals without genotypes.
#' @param sele.g character. Determines the type of data being analyzed:
#' If sele.g="n", it considers the data as complete genotyping data. If
#' sele.g="f", it treats the data as selective genotyping data and utilizes
#' the proposed corrected frequency model (Lee 2014) for analysis; If
#' sele.g="t", it considers the data as selective genotyping data and uses
#' the truncated model (Lee 2014) for analysis; If sele.g="p", it treats
#' the data as selective genotyping data and uses the population
#' frequency-based model (Lee 2014) for analysis. Note that the 'yu'
#' argument must be provided when sele.g="f" or "p".
#' @param tL numeric. The lower truncation point of phenotype value when
#' sele.g="t". When sele.g="t" and tL=NULL, the 'yu' argument must be
#' provided. In this case, the function will consider the minimum of 'yu'
#' as the lower truncation point.
#' @param tR numeric. The upper truncation point of phenotype value when
#' sele.g="t". When sele.g="t" and tR=NULL, the 'yu' argument must be
#' provided. In this case, the function will consider the maximum of 'yu'
#' as the upper truncation point.
#' @param method character. When method="EM", it indicates that the interval
#' mapping method by Lander and Botstein (1989) is used in the analysis.
#' Conversely, when method="REG", it indicates that the approximate regression
#' interval mapping method by Haley and Knott (1992) is used in the analysis.
#' @param type character. The population type of the dataset. Includes
#' backcross (type="BC"), advanced intercross population (type="AI"), and
#' recombinant inbred population (type="RI"). The default value is "RI".
#' @param D.matrix matrix. The design matrix of QTL effects is a g*p matrix,
#' where g is the number of possible QTL genotypes, and p is the number of
#' effects considered in the MIM model. This design matrix can be easily
#' generated by the function D.make(). If set to NULL, it will automatically
#' generate a design matrix with all additive and dominant effects and without
#' any epistasis effect.
#' @param ng integer. The generation number of the population type. For
#' instance, in a BC1 population where type="BC", ng=1; in an AI F3
#' population where type="AI", ng=3.
#' @param cM logical. Specify the unit of marker position. If cM=TRUE, it
#' denotes centimorgan; if cM=FALSE, it denotes morgan.
#' @param scope numeric vector. During the MIM process, it will search forward
#' and backward for the corresponding centimorgan (cM). Users can assign a
#' numeric number for every QTL or a numeric vector for each QTL. Note that 0
#' denotes that the corresponding QTL position is fixed, and the positions of
#' its surrounding intervals will not be searched.
#' @param speed numeric. The walking speed of the QTL search (in cM).
#' @param crit numeric. The convergence criterion of EM algorithm.
#' The E and M steps will iterate until a convergence criterion is met.
#' It must be a value between 0 and 1.
#' @param console logical. Determines whether the process of the algorithm
#' will be displayed in the R console or not.
#' @param IMresult list. The data list of the output from IM.search(). The
#' required parameters for this function will be extracted from the data list.
#' @param MIMresult list. The data list of the output from MIM.search() or
#' MIM.points(). The required parameters for this function will be extracted
#' from the data list.
#'
#' @return
#' \item{effect}{The estimated effects, log likelihood value, and LRT
#' statistics of all searched positions.}
#' \item{QTL.best}{The positions of the best QTL combination.}
#' \item{effect.best}{The estimated effects and LRT statistics of the best
#' QTL combination.}
#' \item{model}{The model of selective genotyping data in this analysis.}
#' \item{inputdata}{The input data of this analysis. It contains marker, geno,
#' y, yu, sele.g, type, ng, cM, and D.matrix. The parameters not provided by
#' the user will be output with default values.}
#'
#' @note
#'
#' When IMresult and MIMresult are entered simultaneously, only IMresult will
#' be processed.
#'
#' If an error occurs, please check whether the original output data of
#' IM.search(), MIM.result(), or MIM.points() is used.
#'
#' @export
#'
#' @references
#'
#' KAO, C.-H. and Z.-B. ZENG 1997 General formulas for obtaining the maximum
#' likelihood estimates and the asymptotic variance-covariance matrix in QTL
#' mapping when using the EM algorithm. Biometrics 53, 653-665. <doi: 10.2307/2533965.>
#'
#' KAO, C.-H., Z.-B. ZENG and R. D. TEASDALE 1999 Multiple interval mapping
#' for Quantitative Trait Loci. Genetics 152: 1203-1216. <doi: 10.1093/genetics/152.3.1203>
#'
#' H.-I LEE, H.-A. HO and C.-H. KAO 2014 A new simple method for improving
#' QTL mapping under selective genotyping. Genetics 198: 1685-1698. <doi: 10.1534/genetics.114.168385.>
#'
#' @seealso
#' \code{\link[QTLEMM]{EM.MIM}}
#' \code{\link[QTLEMM]{IM.search}}
#' \code{\link[QTLEMM]{MIM.search}}
#'
#' @examples
#'
#' # load the example data
#' load(system.file("extdata", "exampledata.RDATA", package = "QTLEMM"))
#'
#' # run and result
#' result <- MIM.points(QTL, marker, geno, y, type = "RI", ng = 2, scope = c(0,1,2), speed = 2)
#' result$QTL.best
#' result$effect.best
#'
#' \dontrun{
#' # Example for selective genotyping data
#' # load the example data
#' load(system.file("extdata", "exampledata.RDATA", package = "QTLEMM"))
#'
#' # make the seletive genotyping data
#' ys <- y[y > quantile(y)[4] | y < quantile(y)[2]]
#' yu <- y[y >= quantile(y)[2] & y <= quantile(y)[4]]
#' geno.s <- geno[y > quantile(y)[4] | y < quantile(y)[2],]
#'
#' # run and result
#' result <- MIM.points(QTL, marker, geno.s, ys, yu, sele.g = "f",
#'  type = "RI", ng = 2, scope = c(0,1,2), speed = 2)
#' result$QTL.best
#' result$effect.best
#' }
#'
MIM.points <- function(QTL = NULL, marker = NULL, geno = NULL, y = NULL, yu = NULL, sele.g = "n", tL = NULL,
                       tR = NULL, method = "EM", type = "RI", D.matrix = NULL, ng = 2, cM = TRUE, scope = 5,
                       speed = 1, crit = 10^-3, console = interactive(), IMresult = NULL, MIMresult = NULL){

  if(!is.null(IMresult)){
    check1 <- names(IMresult) == c("effect", "LRT.threshold", "detect.QTL", "model", "inputdata")
    if(!is.list(IMresult) | sum(check1) != 5){
      stop("IMresult data error, please input all of the original output data of IM.search().", call. = FALSE)
    } else {
      check2 <- names(IMresult$inputdata) == c("marker", "geno", "y", "yu", "sele.g", "type", "ng", "cM", "d.eff" )
      if(!is.list(IMresult$inputdata) | sum(check2) != 9){
        stop("IMresult data error, please input all of the original output data of IM.search().", call. = FALSE)
      } else {
        if(is.data.frame(IMresult$detect.QTL) | is.matrix(IMresult$detect.QTL)){
          QTL <- as.matrix(IMresult$detect.QTL[,1:2])
        } else {stop("IMresult data error, please input all of the original output data of IM.search().", call. = FALSE)}
        marker <- IMresult$inputdata$marker
        geno <- IMresult$inputdata$geno
        y <- IMresult$inputdata$y
        yu <- IMresult$inputdata$yu
        sele.g <- IMresult$inputdata$sele.g
        type <- IMresult$inputdata$type
        ng <- IMresult$inputdata$ng
        cM <- IMresult$inputdata$cM
        if(is.null(D.matrix)){
          D.matrix <- MIMresult$inputdata$D.matrix
        }
      }
    }
  } else if(!is.null(MIMresult)){
    check1 <- names(MIMresult) == c("effect", "QTL.best", "effect.best", "model", "inputdata")
    if(!is.list(MIMresult) | sum(check1) != 5){
      stop("MIMresult data error, please input all of the original output data of MIM.search()/MIM.points().", call. = FALSE)
    } else {
      check2 <- names(MIMresult$inputdata) == c("marker", "geno", "y", "yu", "sele.g", "type", "ng", "cM", "D.matrix" )
      if(!is.list(MIMresult$inputdata) | sum(check2) != 9){
        stop("MIMresult data error, please input all of the original output data of MIM.search()/MIM.points().", call. = FALSE)
      } else {
        if(is.data.frame(MIMresult$QTL.best) | is.matrix(MIMresult$QTL.best)){
          QTL <- as.matrix(MIMresult$QTL.best[,1:2])
        } else {stop("MIMresult data error, please input all of the original output data of MIM.search()/MIM.points().", call. = FALSE)}
        marker <- MIMresult$inputdata$marker
        geno <- MIMresult$inputdata$geno
        y <- MIMresult$inputdata$y
        yu <- MIMresult$inputdata$yu
        sele.g <- MIMresult$inputdata$sele.g
        type <- MIMresult$inputdata$type
        ng <- MIMresult$inputdata$ng
        cM <- MIMresult$inputdata$cM
        if(is.null(D.matrix)){
          D.matrix <- MIMresult$inputdata$D.matrix
        }
      }
    }
  }

  if(is.null(QTL) | is.null(marker) | is.null(geno) | is.null(y)){
    stop("Input data is missing. The argument QTL, marker, geno, and y must be input.", call. = FALSE)
  }

  genotest <- table(c(geno))
  datatry <- try(geno*geno, silent=TRUE)
  if(class(datatry)[1] == "try-error" | FALSE %in% (names(genotest) %in% c(0, 1, 2))  | length(dim(geno)) != 2){
    stop("Genotype data error, please cheak your genotype data.", call. = FALSE)
  }

  marker <- as.matrix(marker)
  markertest <- c(ncol(marker) != 2, NA %in% marker, marker[,1] != sort(marker[,1]), nrow(marker) != ncol(geno))
  datatry <- try(marker*marker, silent=TRUE)
  if(class(datatry)[1] == "try-error" | TRUE %in% markertest){
    stop("Marker data error, or the number of marker does not match the genetype data.", call. = FALSE)
  }

  QTL <- as.matrix(QTL)
  datatry <- try(QTL*QTL, silent=TRUE)
  if(class(datatry)[1] == "try-error" | ncol(QTL) != 2 | NA %in% QTL | max(QTL[, 1]) > max(marker[, 1])){
    stop("QTL data error, please cheak your QTL data.", call. = FALSE)
  }

  for(i in 1:nrow(QTL)){
    ch0 <- QTL[i, 1]
    q0 <- QTL[i, 2]
    if(!ch0 %in% marker[, 1]){
      stop("The specified QTL is not in the position that can estimate by the marker data.", call. = FALSE)
    }
    if(q0 > max(marker[marker[, 1] == ch0, 2]) | q0 < min(marker[marker[, 1] == ch0, 2])){
      stop("The specified QTL is not in the position that can estimate by the marker data.", call. = FALSE)
    }
  }

  y[is.na(y)] <- mean(y,na.rm = TRUE)
  y <- c(y)
  datatry <- try(y%*%geno, silent=TRUE)
  if(class(datatry)[1] == "try-error"){
    stop("Phenotype data error, or the number of individual does not match the genotype data.", call. = FALSE)
  }

  if(!is.null(yu)){
    yu <- yu[!is.na(yu)]
    datatry <- try(yu%*%yu, silent=TRUE)
    if(class(datatry)[1] == "try-error"){
      stop("yu data error, please check your yu data.", call. = FALSE)
    }
  }

  if(!sele.g[1] %in% c("n", "t", "p", "f") | length(sele.g)!=1){
    stop("Parameter sele.g error, please check and fix.", call. = FALSE)
  }

  if(sele.g == "t"){
    lrtest <- c(tL[1] < min(c(y,yu)), tR[1] > max(c(y,yu)), tR[1] < tL[1],
                length(tL) > 1, length(tR) > 1)
    datatry <- try(tL[1]*tR[1], silent=TRUE)
    if(class(datatry)[1] == "try-error" | TRUE %in% lrtest){
      stop("Parameter tL or tR error, please check and fix.", call. = FALSE)
    }
    if(is.null(tL) | is.null(tR)){
      if(is.null(yu)){
        stop("yu data error, the yu data must be input for truncated model when parameter tL or tR is not set.", call. = FALSE)
      }
    }
  }

  nq <- nrow(QTL)
  ch0 <- QTL[, 1]

  if(!type[1] %in% c("AI","RI","BC") | length(type) > 1){
    stop("Parameter type error, please input AI, RI, or BC.", call. = FALSE)
  }

  if(!is.null(D.matrix)){
    D.matrix <- as.matrix(D.matrix)
    datatry <- try(D.matrix*D.matrix, silent=TRUE)
    if(type == "BC"){dn0 <- 2
    } else {dn0 <- 3}
    if(class(datatry)[1] == "try-error" | NA %in% D.matrix | nrow(D.matrix) != dn0^nq){
      stop("Parameter D.matrix error, or the combination of genotypes in design matrix is error.",
           call. = FALSE)
    }
  }

  if(!is.numeric(ng) | length(ng) > 1 | min(ng) < 1){
    stop("Parameter ng error, please input a positive integer.", call. = FALSE)
  }
  ng <- round(ng)

  if(!cM[1] %in% c(0,1) | length(cM > 1)){cM <- TRUE}

  if(!is.numeric(speed) | length(speed) > 1 | min(speed) < 0){
    stop("Parameter speed error, please input a positive number.", call. = FALSE)
  }

  scope <- c(scope)
  datatry <- try(scope*scope, silent=TRUE)
  if(class(datatry)[1] == "try-error" | NA %in% scope | length(scope) > nq | min(scope) < 0){
    stop("Parameter scope error, please input the positive integers or 0.", call. = FALSE)
  }

  if(length(scope) < nq){
    scope <- rep(scope[1],nq)
  }

  if(!is.numeric(crit) | length(crit) > 1 | min(crit) <= 0 | max(crit) >= 1){
    stop("Parameter crit error, please input a positive number between 0 and 1.", call. = FALSE)
  }

  if(!console[1] %in% c(0,1) | length(console) > 1){console <- TRUE}

  if(!cM){
    QTL <- QTL*100
    marker[, 2] <- marker[, 2]*100
  }

  sr <- list()
  q0 <- QTL[1, 2]
  r0 <- union(seq(q0-scope[1], q0, speed), seq(q0, q0+scope[1], speed))
  r0 <- r0[r0 > min(marker[marker[, 1] == ch0[1], 2]) & r0 < max(marker[marker[, 1] == ch0[1], 2])]
  sr[[1]] <- r0
  if(nq > 1){
    for(i in 2:nq){
      q0 <- QTL[i, 2]
      r0 <- union(seq(q0-scope[i], q0, speed), seq(q0, q0+scope[i], speed))
      r0 <- r0[r0 > min(marker[marker[, 1] == ch0[i], 2]) & r0 < max(marker[marker[, 1] == ch0[i], 2])]
      sr[[i]] <- r0
    }
  }

  sl <- unlist(lapply(sr,length))
  sm <- matrix(NA, prod(sl), length(sl))
  sl2 <- c(1,sl,1)
  for(i in 1:length(sl)){
    sm[, i] <- rep(sr[[i]], each = prod(sl2[(i+2):length(sl2)]), time = prod(sl2[1:i]))
  }

  if(is.null(D.matrix)){
    D.matrix <- D.make(as.numeric(nq), type = type)
  }

  name0 <- cbind(paste("QTL", 1:nq, ".ch", sep = ""), paste("QTL", 1:nq, ".cM", sep = ""))
  cat(paste("#", paste(t(name0), collapse = "\t"), "LRT", "log.likelihood", "\n", sep = "\t")[console])

  if(method == "EM"){
    meth <- methEM
    if(sele.g == "p" | sele.g == "f"){
      ya <- c(y, yu)
    } else {
      ya <- y
    }
  } else if (method=="REG"){
    if(sele.g == "p" | sele.g == "f"){
      ya <- c(y, yu)
    } else {
      ya <- y
    }

    meth <- methSG
  }

  effect <- matrix(NA, nrow(sm), 2*nq+ncol(D.matrix)+3)
  for(i in 1:nrow(sm)){
    QTL0 <- cbind(ch0, sm[i,])
    fit0 <- meth(QTL0, marker, geno, D.matrix, y, yu, tL, tR, type, ng, sele.g, crit, cM, ya)
    effect0 <- c(t(QTL0), fit0[[1]], fit0[[4]], fit0[[5]], fit0[[6]])
    model <- fit0[[7]]

    LRT <- round(effect0[length(effect0)-2], 3)
    like <- round(effect0[length(effect0)-1], 3)
    cat(paste(paste(i, "/", nrow(sm), sep = ""), paste(t(QTL0), collapse = "\t"), LRT, like, "\n", sep = "\t")[console])
    effect[i,] <- effect0
  }
  colnames(effect) <-  c(t(name0), colnames(D.matrix), "LRT", "log.likelihood", "R2")
  row.names(effect) <- 1:nrow(effect)

  best <- effect[effect[,ncol(effect)-1] == max(effect[,ncol(effect)-1]),]
  QTL.best <- matrix(best[1:(2*nq)], nq, 2, byrow = TRUE)
  colnames(QTL.best) <- c("chromosome", "position(cM)")

  effect.best <- best[-(1:(2*nq))]

  inputdata <- list(marker = marker, geno = geno, y = y, yu = yu, sele.g = sele.g, type = type, ng = ng, cM = cM,
                    D.matrix = D.matrix)

  result <- list(effect = effect, QTL.best = QTL.best, effect.best = effect.best, model = model, inputdata = inputdata)
  return(result)
}

Try the QTLEMM package in your browser

Any scripts or data that you put into this service are public.

QTLEMM documentation built on Sept. 12, 2025, 10:19 a.m.