Nothing
#' Genetic algorithm
#'
#' Perform the genetic algorithm for changepoint detection.
#' This involves the minimization of an objective function using a genetic algorithm (GA).
#' The algorithm can be run sequentially or with explicit parallelization.
#'
#' For any pre-specified time series model with a specified set of changepoint locations,
#' model fit is evaluated using a fitness function \eqn{Q(\boldsymbol{\theta})},
#' where \eqn{\boldsymbol{\theta}=(\boldsymbol{s},\boldsymbol{\tau},\boldsymbol{\beta})'}
#' denotes the full parameter vector. Here, \eqn{\boldsymbol{s}} denotes the set
#' of model hyperparameters, potentially including the AR or MA orders, the degree
#' of ARIMA differencing, or the periodic AR order for PAR models. The vector
#' \eqn{\boldsymbol{\tau}=\{\tau_{1}, \ldots, \tau_{m}\}} specifies the
#' changepoint locations, with the number of changepoints \eqn{m} inferred as
#' part of the estimation. Each individual chromosome representation is specified as a vector,
#' \deqn{C = (m, \boldsymbol{s}, \boldsymbol{\tau}, N+1)',}
#' where \eqn{m} represents the number of changepoints and is also equivalent to
#' the length of vector \eqn{\boldsymbol{\tau}} containing all the changepoint
#' locations. \eqn{\boldsymbol{s}} contains the integer-valued parameter for time
#' series model structure specification, such as AR, MA, or PAR orders.
#' If no model order selection is desired, then \eqn{\boldsymbol{s}} is omitted
#' and the GA detects changepoints only. The changepoint locations in
#' \eqn{\boldsymbol{\tau}} are encoded as interger values between 2 and \eqn{N},
#' allowing the length of \eqn{\boldsymbol{\tau}} to vary dynamically with \eqn{m}.
#' The value \eqn{N+1} is appended at the end of \eqn{C} to serve as a delimiter
#' marking the boundary of the chromosome.
#'
#' @param ObjFunc The fitness function to be minimized. Users can specify any R or Rcpp
#' function as the fitness function, setting the input as the potential solution to
#' the optimization problem and returning a numerical value as the output/fitness.
#' Depending on the user-specified chromosome representation, the optimization task
#' can be changepoint detection only or changepoint detection plus model order selection,
#' which can be specified via the \code{option} parameter. When
#' \code{option="both"}, the list \code{prange} must be specified to give the range
#' of model orders.
#' @param N The sample size of the time series.
#' @param prange The default value is \code{NULL} for changepoint detection only task.
#' If both model order selection and changepoint detection are required, the \code{prange}
#' argument should be provided as a list. Each element in this list must specify the range
#' for a corresponding model order parameter, and the length of the list object of
#' \code{prange} must match the number of order parameters to be estimated.
#' @param popSize An integer represents the number of individuals/chromosomes in each population.
#' @param pcrossover The probability that the crossover operator will apply to
#' the two selected parents' chromosomes to produce the offspring. The typical
#' value is close to 1, with the default setting in this package being 0.95.
#' @param pmutation The probability that the mutation operator applies on one
#' individual chromosome. Similar to the natural mutation process, new genetic
#' information is introduced to the offspring chromosome with a relatively small
#' probability (close to 0), with a default value of 0.3.
#' @param pchangepoint The probability that a changepoint has occurred with a default value of 0.01. User
#' could change this probability based on domain knowledge and the time series
#' length. This probability is used during population initialization and in the
#' creation of new chromosomes by the mutation operator. By default, the mutation
#' operator function generates a new individual as the mutated offspring.
#' @param minDist The minimum length between two adjacent changepoints. Default value equals to one.
#' @param mmax The maximum number of changepoints allowed in the time series data
#' corresponds to the maximum possible length of \eqn{\boldsymbol{\tau}}.
#' For a time series of length 1000 and we only want to detect the changepoint
#' (\code{option="cp"}), the default value is 499. The suggested value should be
#' based on the length of the time series. For instance, if a time series has
#' length \eqn{N}, the recommended \code{mmax} should be \eqn{N/2-1}. It is suggested to
#' add the number of model hyperparameters if both changepoint detection and
#' model order selection tasks are of-interested simultaneously
#' (\code{option="both"}).
#' @param lmax The maximum possible length of the chromosome representation.
#' For a time series of length 1000 and we only want to detect the changepoint
#' (\code{option="cp"}), the default value is 501. The suggested value should be
#' based on the length of the time series. For instance, if a time series has
#' length \eqn{N}, the recommended \code{lmax} should be \eqn{2+N/2-1}. It is
#' suggested to add the number of model hyperparameters if both changepoint detection and
#' model order selection tasks are of-interested simultaneously (\code{option="both"}).
#' @param maxgen The maximum number of generations the GA can run before the
#' search is forcibly terminated.
#' @param maxconv If the overall best fitted value doesn't change after
#' \code{maxconv} consecutive migrations, the GA algorithm stops.
#' @param option A character string controls the optimization task. \code{"cp"}
#' indicates the task is changepoint detection only. \code{"both"} indicates the
#' task will include both changepoint detection and model order selection.
#' @param monitoring A logical value with either \code{TRUE} or \code{FALSE},
#' indicating whether to print out summarized results (current best fitness
#' function value and its corresponding $C$) for each generation from the GA.
#' @param parallel A logical value with either \code{TRUE} or \code{FALSE}, indicating
#' whether to use multiple cores for parallel computation of the fitness function
#' values for individuals after population initialization.
#' @param nCore An integer with the default value of \code{NULL}. It
#' represents the number of cores used in parallel computing. The value of
#' \code{nCore} must be less than the number of physical cores available.
#' @param tol An numerical value with the default value of \code{1e-05}. The
#' tolerance level that helps evaluate whether the two iterations have the same
#' fitness value, which aids in determining GA termination.
#' @param seed An integer with the default value of \code{NULL}. An single
#' integer allows function produce reproducible results.
#' @param popInitialize A function or sourced function name character string.
#' It should be designed for initializing a population. The default population
#' initialization is random initialization with some imposed constraints. See
#' \code{\link{random_population}} for example. The function returned object is
#' a matrix, \code{pop}. The users can specified their own \code{population}
#' function. It could also be a matrix object, which contain the user specified
#' chromosome. By default, each column represents one individual chromosome.
#' See \code{\link{random_population}} for details.
#' @param suggestions A list object. Default value is \code{NULL}. Each element
#' includes better or more reasonable guessed changepoints locations, which will
#' resulting in one chromosome. If the number of provided suggestions equals to
#' \code{popSize}, all the suggestions will be used as population. If the number
#' of provided suggestions is less than \code{popSize}, the function from
#' \code{popInitialize} will generate the remaining individual chromosomes.
#' The number of provided suggestions cannot be greater than \code{popSize}.
#' Having better \code{suggestions} can help GA converges faster.
#' @param selection A function or sourced function name character string. This
#' GA operator can help select \code{mom} and \code{dad} from current generation
#' population, where \code{dad} is set to have better fit (smaller fitness
#' function values). The default for selection uses the linear rank selection
#' method. See \code{\link{selection_linearrank}} for example. The function
#' returned object is a list contain the chromosomes for \code{mom} and
#' \code{dad}.
#' @param crossover A function or sourced function name character string. This
#' GA operator can apply crossover to the chosen parents to produce child for
#' next generation with specified probability. The default for crossover uses
#' the uniform crossover method. See \code{\link{uniformcrossover}} for details
#' in the default crossover operator. The function returned object is a vector
#' contain the chromosomes for \code{child}.
#' @param mutation A function or sourced function name character string. This
#' GA operator can apply mutation to the produced \code{child} with the
#' specified probability \code{pmutation}. See \code{\link{mutation}} for
#' details in the default mutation operator. The function returned object
#' is a vector contain \code{child} chromosome representation.
#' @param ... additional arguments that will be passed to the fitness function.
#' @return Return an object class \code{cptga-class}. See
#' \code{\link{cptga-class}} for a more detailed description.
#' @import stats
#' @import Rcpp
#' @import foreach
#' @import doParallel
#' @import parallel
#' @importFrom methods new
#' @importFrom utils str
#' @useDynLib changepointGA
#' @export
#' @examples
#' \donttest{
#'
#' N <- 1000
#' XMatT <- matrix(1, nrow = N, ncol = 1)
#' Xt <- ts.sim(
#' beta = 0.5, XMat = XMatT, sigma = 1, phi = 0.5, theta = NULL,
#' Delta = c(2, -2), CpLoc = c(250, 750), seed = 1234
#' )
#'
#' ## Multiple changepoint detection without model order selection
#'
#' # without suggestions
#' GA.res <- cptga(ObjFunc = ARIMA.BIC, N = N, XMat = XMatT, Xt = Xt)
#' summary(GA.res)
#' plot(GA.res, data = Xt)
#'
#' # with suggestions
#' suggestions <- list(NULL, 250, c(250, 500), c(250, 625), c(250, 500, 750))
#' GA.res <- cptga(ObjFunc = ARIMA.BIC, N = N, suggestions = suggestions, XMat = XMatT, Xt = Xt)
#' summary(GA.res)
#' plot(GA.res, data = Xt)
#'
#'
#' ## Multiple changepoint detection with model order selection
#'
#' prange <- list(ar = c(0, 3), ma = c(0, 3))
#'
#' # without suggestions
#' GA.res <- cptga(
#' ObjFunc = ARIMA.BIC.Order, N = N, prange = prange,
#' option = "both", XMat = XMatT, Xt = Xt
#' )
#' summary(GA.res)
#' plot(GA.res, data = Xt)
#'
#' # with suggestions
#' suggestions <- list(NULL, 250, c(250, 500), c(250, 625), c(250, 500, 750))
#' GA.res <- cptga(
#' ObjFunc = ARIMA.BIC.Order, N = N, prange = prange,
#' suggestions = suggestions, option = "both", XMat = XMatT, Xt = Xt
#' )
#' summary(GA.res)
#' plot(GA.res, data = Xt)
#' }
cptga <- function(ObjFunc,
N,
prange = NULL,
popSize = 200,
pcrossover = 0.95,
pmutation = 0.3,
pchangepoint = 0.01,
minDist = 1,
mmax = NULL,
lmax = NULL,
maxgen = 50000,
maxconv = 5000,
option = "cp",
monitoring = FALSE,
parallel = FALSE,
nCore = NULL,
tol = 1e-05,
seed = NULL,
popInitialize = "random_population",
suggestions = NULL,
selection = "selection_linearrank",
crossover = "uniformcrossover",
mutation = "mutation",
...) {
call <- match.call()
dots <- list(...)
if (missing(N)) {
stop("The sample size must be provided")
}
if (is.null(mmax)) {
mmax <- floor(N / 2 - 1)
}
if (is.null(lmax)) {
lmax <- floor(2 + N / 2 - 1)
}
if (missing(ObjFunc)) {
stop("A fitness function must be provided")
}
if (!is.function(ObjFunc)) {
stop("A fitness function must be provided")
}
i <- NULL # add for global variables declare
plen <- length(prange)
if (pcrossover < 0 | pcrossover > 1) {
stop("Probability of crossover must be between 0 and 1.")
}
if (pmutation < 0 | pmutation > 1) {
stop("Probability of mutation must be between 0 and 1.")
}
if (pchangepoint < 0 | pchangepoint > 1) {
stop("Probability of changepoint must be between 0 and 1.")
}
if (minDist >= N | minDist < 1) {
stop("Minimum number of locations between two changepoints invalid.")
}
if (lmax < mmax + 2) {
stop("Maximum length of chromosome needs to be larger than (maximum number of changepoints+2).")
}
if (option == "both" & plen == 0) {
stop("Opt for changepoint and order search, prange must be provided.")
}
if (option == "cp" & plen != 0) {
stop("Opt for changepoint search, prange must be NULL.")
}
# set seed for reproducibility
if (!is.null(seed)) {
set.seed(seed)
}
if (!is.function(popInitialize)) {
popInitialize <- get(popInitialize)
}
if (!is.function(selection)) {
selection <- get(selection)
}
if (!is.function(crossover)) {
crossover <- get(crossover)
}
if (!is.function(mutation)) {
mutation <- get(mutation)
}
object <- new("cptga",
call = call,
N = N,
prange = prange,
popSize = popSize,
pcrossover = pcrossover,
pmutation = pmutation,
pchangepoint = pchangepoint,
minDist = minDist,
mmax = mmax,
lmax = lmax,
maxgen = maxgen,
maxconv = maxconv,
option = option,
monitoring = monitoring,
parallel = parallel,
nCore = nCore,
tol = tol,
seed = seed,
suggestions = suggestions,
population = matrix(),
fitness = vector(),
overbestchrom = vector(),
overbestfit = double(),
bestfit = vector(),
count = integer(),
convg = integer()
)
###### step 1: Initialize population
if (!is.null(suggestions)) {
if (!is.list(suggestions)) {
stop("Suggested changepoints must be a List.")
}
# 1. with suggestions
if (any(sapply(suggestions, function(x) any(is.na(x))))) {
stop("NA provided in suggestions list")
}
n.suggs <- length(suggestions)
suggestions.mat <- matrix(0L, nrow = lmax, ncol = n.suggs)
for (i in seq_along(suggestions)) {
idx <- suggestions[[i]]
if (any(idx <= 1 | idx > N)) {
stop(paste0("\n No. ", i, "th suggestion is invalid."))
}
if (object@option == "cp") {
if (is.null(idx)) {
suggestions.mat[1:2, i] <- c(0, N + 1)
} else {
idx <- idx[idx > 1 & idx <= N] # filter valid indices
suggestions.mat[1:(length(idx) + 2), i] <- c(length(idx), idx, N + 1)
}
} else if (object@option == "both") {
hyper.param <- sapply(prange, function(x) if (length(x) > 0) sample(min(x):max(x), 1) else NA)
if (is.null(idx)) {
suggestions.mat[1:(2 + plen), i] <- c(0, hyper.param, N + 1)
} else {
idx <- idx[idx > 1 & idx <= N] # filter valid indices
suggestions.mat[1:(length(idx) + plen + 2), i] <- c(length(idx), hyper.param, idx, N + 1)
}
}
}
RemainpopSize <- popSize - n.suggs
if (RemainpopSize > 0) {
# 1.1 partial from suggestions
init_args <- c(
dots,
list(
popSize = RemainpopSize, prange = prange, N = N,
minDist = minDist, pchangepoint = pchangepoint,
mmax = mmax, lmax = lmax
)
)
init_args <- .filter_args(init_args, popInitialize)
pop <- do.call(popInitialize, init_args)
pop <- cbind(pop, suggestions.mat)
} else if (RemainpopSize == 0) {
# 1.2 completely from suggestions
pop <- suggestions.mat
} else {
# 1.3 some errors
stop("Number of suggested chromosome must be less than specified population size.")
}
} else {
# 2. completely generated by popInitialize
init_args <- c(
dots,
list(
popSize = popSize, prange = prange, N = N,
minDist = minDist, pchangepoint = pchangepoint,
mmax = mmax, lmax = lmax
)
)
init_args <- .filter_args(init_args, popInitialize)
pop <- do.call(popInitialize, init_args)
}
obj_shared <- .filter_args(dots, ObjFunc)
obj_formals <- setdiff(names(formals(ObjFunc)), "...")
first_param <- obj_formals[1]
has_plen <- "plen" %in% obj_formals
## evaluate the fitness (Parallel or NOT)
if (parallel) {
nAvaCore <- detectCores()
if (is.null(nCore)) {
stop(paste0("Missing number of computing cores (", nAvaCore, " cores available)."))
}
registerDoParallel(cores = nCore)
popFit <- foreach::foreach(j = 1:popSize, .combine = "c") %dopar% {
chrom <- pop[1:(pop[1, j] + plen + 2), j]
call_list <- c(
setNames(list(chrom), first_param),
if (has_plen) list(plen = plen) else list(),
obj_shared
)
do.call(ObjFunc, call_list)
}
} else {
popFit <- rep(NA, popSize)
for (j in 1:popSize) {
chrom <- pop[1:(pop[1, j] + plen + 2), j]
call_list <- c(
setNames(list(chrom), first_param),
if (has_plen) list(plen = plen) else list(),
obj_shared
)
popFit[j] <- do.call(ObjFunc, call_list)
}
}
object@population <- pop
object@fitness <- popFit
count <- 0
bestfit <- rep(NA, maxgen)
bestchrom <- matrix(0, nrow = lmax, ncol = maxgen)
repeat{
# indicator for c("crossover", "mutation")
# flag[1]=1 indicating no cross-over
# flag[2]=1 indicating no mutation
flag <- rep(0, 2)
##### step 2: parents selection
sel_args <- .filter_args(dots, selection)
parents <- do.call(selection, c(list(pop, popFit), sel_args))
dad <- parents$dad
mom <- parents$mom
##### step 3: crossover
a1 <- runif(1)
if (a1 <= pcrossover) {
cross_args <- .filter_args(dots, crossover)
child <- do.call(crossover, c(
list(
mom = mom, dad = dad,
prange = prange, minDist = minDist,
lmax = lmax, N = N
),
cross_args
))
} else {
child <- dad
flag[1] <- 1
}
##### step 4: mutation
a2 <- runif(1)
if (a2 <= pmutation) {
mut_args <- .filter_args(dots, mutation)
child <- do.call(mutation, c(
list(
child, prange, minDist,
pchangepoint, lmax, mmax, N
),
mut_args
))
} else {
flag[2] <- 1
}
##### step 5: form new generation
# steady state method:
# replace the least fit in the current pop with child if child is better.
flagsum <- flag[1] + flag[2]
if (flagsum < 2) {
# flagsum < 2 indicating new individual produced and fitness evaluation needed
child_chrom <- child[1:(child[1] + plen + 2)]
call_list <- c(
setNames(list(child_chrom), first_param),
if (has_plen) list(plen = plen) else list(),
obj_shared
)
fitChild <- do.call(ObjFunc, call_list)
leastfit <- max(popFit) # with largest fitness value
if (fitChild < leastfit) {
# indicating child is better than the worst one and replace
pp <- which.max(popFit)
pop[, pp] <- child
popFit[pp] <- fitChild
}
}
object@population <- pop
object@fitness <- popFit
# best popFit with minimized fitness value
count <- count + 1
genbest <- which.min(popFit)
bestfit[count] <- popFit[genbest]
bestchrom[, count] <- pop[, genbest]
object@bestfit <- bestfit
# check convergence once count >= maxconv
# if after maxconv consecutive generations, the overall best does not change, then stop
if (count >= maxconv) {
tmpbestfit <- bestfit[(count - maxconv + 1):count]
decision <- checkConv(tmpbestfit, maxconv, tol)
if (monitoring) {
cat("\n My decision:", decision)
}
if (decision == 1) {
overbestfit <- bestfit[count]
overbestchrom <- bestchrom[, count]
overbestchrom <- overbestchrom[1:(overbestchrom[1] + plen + 2)]
if (monitoring) {
cat("\n==== No.", count, "Generation ====")
cat("\n overall bestfit =", overbestfit)
cat("\n overall bestchrom =", overbestchrom, "\n")
}
pp <- which(is.na(bestfit))[1] - 1
bestfit <- bestfit[1:pp]
bestchrom <- bestchrom[, 1:pp]
object@count <- count
object@convg <- 0
object@bestfit <- bestfit
break
}
}
overbestfit <- bestfit[count]
overbestchrom <- bestchrom[, count]
object@overbestfit <- bestfit[count]
object@overbestchrom <- overbestchrom[1:(overbestchrom[1] + plen + 2)]
if (monitoring) {
cat("\n==== No.", count, "Generation ====")
cat("\n overall bestfit =", overbestfit)
cat("\n overall bestchrom =", overbestchrom, "\n")
}
if (count >= maxgen) {
object@count <- count
object@convg <- 1
break
}
}
return(object)
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.