cptgaisl: Island model based genetic algorithm

View source: R/cptgaisl.R

cptgaislR Documentation

Island model based genetic algorithm

Description

Perform the modified island-based genetic algorithm (IslandGA) for multiple changepoint detection. Minimization of an objective function using genetic algorithm (GA). The algorithm can be run sequentially or in explicit parallelisation.

Usage

cptgaisl(
  ObjFunc,
  N,
  prange = NULL,
  popSize = 200,
  numIslands = 5,
  pcrossover = 0.95,
  pmutation = 0.3,
  pchangepoint = 0.01,
  minDist = 1,
  mmax = NULL,
  lmax = NULL,
  maxMig = 1000,
  maxgen = 50,
  maxconv = 100,
  option = "cp",
  monitoring = FALSE,
  parallel = FALSE,
  nCore = NULL,
  tol = 1e-05,
  seed = NULL,
  popInitialize = "random_population",
  suggestions = NULL,
  selection = "selection_linearrank",
  crossover = "uniformcrossover",
  mutation = "mutation",
  ...
)

Arguments

ObjFunc

The fitness function to be minimized. Users can specify any R or Rcpp function as the fitness function, setting the input as the potential solution to the optimization problem and returning a numerical value as the output/fitness. Depending on the user-specified chromosome representation, the optimization task can be changepoint detection only or changepoint detection plus model order selection, which can be specified via the option parameter. When option="both", the list prange must be specified to give the range of model orders.

N

The sample size of the time series.

prange

The default value is NULL for changepoint detection only task. If both model order selection and changepoint detection are required, the prange argument should be provided as a list. Each element in this list must specify the range for a corresponding model order parameter, and the length of the list object of prange must match the number of order parameters to be estimated.

popSize

An integer representing the total number of individuals in each generation, which equal to the number of islands multiplied by the size of each island (i.e., popSize = numIslands × Islandsize).

numIslands

An integer representing the number of islands (sub-populations).

pcrossover

The probability that the crossover operator will apply to the two selected parents' chromosomes to produce the offspring. The typical value is close to 1, with the default setting in this package being 0.95.

pmutation

The probability that the mutation operator applies on one individual chromosome. Similar to the natural mutation process, new genetic information is introduced to the offspring chromosome with a relatively small probability (close to 0), with a default value of 0.3.

pchangepoint

The probability that a changepoint has occurred. User could change this probability based on domain knowledge and the time series length. This probability is used during population initialization and in the creation of new chromosomes by the mutation operator. By default, the mutation operator function generates a new individual as the mutated offspring.

minDist

The minimum length between two adjacent changepoints. Default value equals to one.

mmax

The maximum number of changepoints allowed in the time series data corresponds to the maximum possible length of \boldsymbol{\tau}. For a time series of length 1000 and we only want to detect the changepoint (option="cp"), the default value is 499. The suggested value should be based on the length of the time series. For instance, if a time series has length N, the recommended mmax should be N/2-1. It is suggested to add the number of model hyperparameters if both changepoint detection and model order selection tasks are of-interested simultaneously (option="both").

lmax

The maximum possible length of the chromosome representation. For a time series of length 1000 and we only want to detect the changepoint (option="cp"), the default value is 501. The suggested value should be based on the length of the time series. For instance, if a time series has length N, the recommended lmax should be 2+N/2-1. It is suggested to add the number of model hyperparameters if both changepoint detection and model order selection tasks are of-interested simultaneously (option="both").

maxMig

An integer indicates the maximum number of migrations. After conducting maxMig migrations, the island model GA stops.

maxgen

An integer indicates the maximum number of generations that each island (subpopulation) undergoes before migration. It also determines the requency of migration. The migration will occur after maxgen generations for each island (sub-population).

maxconv

An integer value is also used for algorithm termination. If the overall best-fitted value remains unchanged after maxconv consecutive migrations, the island model GA will terminate.

option

A character string controls the optimization task. "cp" indicates the task is changepoint detection only. "both" indicates the task will include both changepoint detection and model order selection.

monitoring

A logical value with either TRUE or FALSE, indicating whether to print out summarized results (current best fitness function value and its corresponding $C$) for each generation from the GA.

parallel

A logical value with TRUE or FALSE, indicating whether to use multiple cores for parallel computation. Different from the basic GA, in this approach, each island evolves independently. New population generation—including parent selection, crossover, and mutation—is processed independently and in parallel for each island, further saving computation time. Once the new population generation is complete, the migration operator is performed sequentially.

nCore

An integer indicates the number of cores utilized in parallel computing. For the island model GA, the number of cores used in parallel is set to the numIslands for convenience.

tol

An numerical value with the default value of 1e-05. The tolerance level that helps evaluate whether the two iterations have the same fitness value, which aids in determining GA termination.

seed

An integer with the default value of NULL. An single integer allows function produce reproducible results.

popInitialize

A function or sourced function name character string. It should be designed for initializing a population. The default population initialization is random initialization with some imposed constraints. See random_population for example. The function returned object is a matrix, pop. The users can specified their own population function. It could also be a matrix object, which contain the user specified chromosome. By default, each column represents one individual chromosome. See random_population for details.

suggestions

A list object. Default value is NULL. Each element includes better or more reasonable guessed changepoints locations, which will resulting in one chromosome. If the number of provided suggestions equals to popSize, all the suggestions will be used as population. If the number of provided suggestions is less than popSize, the function from popInitialize will generate the remaining individual chromosomes. The number of provided suggestions cannot be greater than popSize. Having better suggestions can help GA converges faster.

selection

A function or sourced function name character string. This GA operator can help select mom and dad from current generation population, where dad is set to have better fit (smaller fitness function values). The default for selection uses the linear rank selection method. See selection_linearrank for example. The function returned object is a list contain the chromosomes for mom and dad.

crossover

A function or sourced function name character string. This GA operator can apply crossover to the chosen parents to produce child for next generation with specified probability. The default for crossover uses the uniform crossover method. See uniformcrossover for details in the default crossover operator. The function returned object is a vector contain the chromosomes for child.

mutation

A function or sourced function name character string. This GA operator can apply mutation to the produced child with the specified probability pmutation. See mutation for details in the default mutation operator. The function returned object is a vector contain child chromosome representation.

...

additional arguments that will be passed to the fitness function.

Details

For any pre-specified time series model with a specified set of changepoint locations, model fit is evaluated using a fitness function Q(\boldsymbol{\theta}), where \boldsymbol{\theta}=(\boldsymbol{s},\boldsymbol{\tau},\boldsymbol{\beta})' denotes the full parameter vector. Here, \boldsymbol{s} denotes the set of model hyperparameters, potentially including the AR or MA orders, the degree of ARIMA differencing, or the periodic AR order for PAR models. The vector \boldsymbol{\tau}=\{\tau_{1}, \ldots, \tau_{m}\} specifies the changepoint locations, with the number of changepoints m inferred as part of the estimation. Each individual chromosome representation is specified as a vector,

C = (m, \boldsymbol{s}, \boldsymbol{\tau}, N+1)',

where m represents the number of changepoints and is also equivalent to the length of vector \boldsymbol{\tau} containing all the changepoint locations. \boldsymbol{s} contains the integer-valued parameter for time series model structure specification, such as AR, MA, or PAR orders. If no model order selection is desired, then \boldsymbol{s} is omitted and the GA detects changepoints only. The changepoint locations in \boldsymbol{\tau} are encoded as interger values between 2 and N, allowing the length of \boldsymbol{\tau} to vary dynamically with m. The value N+1 is appended at the end of C to serve as a delimiter marking the boundary of the chromosome.

Value

Return an object class cptgaisl-class. See cptgaisl-class for a more detailed description.

Examples



N <- 1000
XMatT <- matrix(1, nrow = N, ncol = 1)
Xt <- ts.sim(
  beta = 0.5, XMat = XMatT, sigma = 1, phi = 0.5, theta = NULL,
  Delta = c(2, -2), CpLoc = c(250, 750), seed = 1234
)

## Multiple changepoint detection without model order selection

# without suggestions
GAISL.res <- cptgaisl(ObjFunc = ARIMA.BIC, N = N, XMat = XMatT, Xt = Xt)
summary(GAISL.res)
plot(GAISL.res, data = Xt)

# with suggestions
suggestions <- list(NULL, 250, c(250, 500), c(250, 625), c(250, 500, 750))
GAISL.res <- cptgaisl(ObjFunc = ARIMA.BIC, N = N, suggestions = suggestions, XMat = XMatT, Xt = Xt)
summary(GAISL.res)
plot(GAISL.res, data = Xt)


## Multiple changepoint detection with model order selection

prange <- list(ar = c(0, 3), ma = c(0, 3))

# without suggestions
GAISL.res <- cptgaisl(
  ObjFunc = ARIMA.BIC.Order, N = N, prange = prange,
  option = "both", XMat = XMatT, Xt = Xt
)
summary(GAISL.res)
plot(GAISL.res, data = Xt)

# with suggestions
suggestions <- list(NULL, 250, c(250, 500), c(250, 625), c(250, 500, 750))
GAISL.res <- cptgaisl(
  ObjFunc = ARIMA.BIC.Order, N = N, prange = prange,
  suggestions = suggestions, option = "both", XMat = XMatT, Xt = Xt
)
summary(GAISL.res)
plot(GAISL.res, data = Xt)


changepointGA documentation built on Nov. 5, 2025, 6:54 p.m.