R/iter_matrix.R

Defines functions iter_matrix

Documented in iter_matrix

#' Starts the genetic algorithm based on a start matrix with
#' specified marginal probabilities.
#'
#' In each step, the genetic algorithm swaps two randomly selected
#' entries in each column of X0. Thus it can be guaranteed that the
#' marginal probabilities do not change. If the correlation matrix is closer to R than that of x0(t-1), X0(t) replaces X0(t-1).
#' @title Genetic algorithm for generating correlated binary data
#' @param X0 Start matrix with specified marginal probabilities. Can
#'   be generated by \code{\link{start_matrix}}.
#' @param R Desired correlation matrix the data should have after
#'   running the genetic algorithm.
#' @param T Maximum number of iterations after which the genetic
#'   algorithm stops.
#' @param e.min Minimum error (RMSE) between the correlation of the
#'   iterated data matrix and R.
#' @param plt Boolean parameter that indicates whether to plot
#'   e.min versus the iteration step.
#' @param perc Boolean parameter that indicates whether to print the
#'   percentage of iteration steps relativ to T.
#' @return A list with four entries:
#'  \describe{
#' \item{\emph{Xt}}{Final representativ data matrix with specified marginal probabilities and a correlation as close as possible to R}
#' \item{\emph{t}}{Number of performed iteration steps (t <= T)}
#' \item{\emph{Rt}}{Empirical correlation matrix of Xt}
#' \item{\emph{RMSE}}{Final RSME error between desired and achieved correlation matrix }
#' }
#' @author Jochen Kruppa, Klaus Jung
#' @references
#' Kruppa, J., Lepenies, B., & Jung, K. (2018). A genetic algorithm for simulating correlated binary data from biomedical research. \emph{Computers in biology and medicine}, \strong{92}, 1-8. \doi{10.1016/j.compbiomed.2017.10.023}
#' @importFrom stats cor
#' @export
#' @examples
#' ### Generation of the representive matrix Xt
#' X0 <- start_matrix(p = c(0.5, 0.6), k = 1000)
#' Xt <- iter_matrix(X0, R = diag(2), T = 10000, e.min = 0.00001)$Xt
#'
#' ### Drawing of a random sample S of size n = 10
#' S <- Xt[sample(1:1000, 10, replace = TRUE),]
iter_matrix <- function(X0, R, T=1000, e.min=.0001, plt=TRUE, perc=TRUE) {
  n = dim(X0)[1]
  m = dim(X0)[2]
  R0 = cor(X0)
  R0[which(is.na(R0))] = 0
  e0 = sqrt(sum((R - R0)^2) / (n * n))
  et = e0
  maxe = e0
  for (t in 1:T) { ### START ITERATION
    if (t%%(T/10)==0 & perc==TRUE) print(paste(100 * t/T, "%"))
    R0 = cor(X0)
    R0[which(is.na(R0))] = 0
    e0 = sqrt(sum((R - R0)^2) / (n * n))
    Xt = X0
    for (j in 1:m) {
      k = sample(1:n, 2, replace=FALSE)
      a = X0[k[1],j]
      b = X0[k[2],j]
      Xt[k[1],j] = b
      Xt[k[2],j] = a
    }
    Rt = cor(Xt)
    Rt[which(is.na(Rt))] = 0
    e1 = sqrt(sum((R - Rt)^2) / (n * n))
    if (e1 <= e0) {
      X0 = Xt
      et = c(et, e1)
    }
    else {
      et = c(et, e0)
      Rt = R0
    }
    if (t%%(T/10)==0 & plt==TRUE) {
      plot(1:(t+1), et, type="l", cex.lab=1.5, cex.axis=1.5, xlab="Number of translocations", ylab="Root-mean-square error", xlim=c(0, T), ylim=c(0, maxe), lwd=2)
      grid()
    }
    if (et[t]<e.min & plt==TRUE) {
      plot(1:(t+1), et, type="l", cex.lab=1.5, cex.axis=1.5, xlab="Number of translocations", ylab="RMSE", xlim=c(0, T), ylim=c(0, maxe), lwd=2)
      grid()
      break
    }
    if (et[t]<e.min) break
  } ### END ITERATION
  RMSE = rep(NA, T)
  RMSE[1:t] = et[1:t]
  out = list(Xt=Xt, t=t, Rt=Rt, RMSE=RMSE)
}

Try the RepeatedHighDim package in your browser

Any scripts or data that you put into this service are public.

RepeatedHighDim documentation built on July 9, 2023, 6:33 p.m.