clust_cp: Clustering time dependent observations with common change...
In BayesChange: Bayesian Methods for Change Points Analysis

clust_cp

R Documentation

Clustering time dependent observations with common change points.

Description

The clust_cp function cluster observations with common change points. Data can be time series or survival functions.

Usage

clust_cp(
  data,
  n_iterations,
  n_burnin = 0,
  params = list(),
  alpha_SM = 1,
  B = 1000,
  L = 1,
  q = 0.5,
  kernel,
  print_progress = TRUE,
  user_seed = 1234
)

Arguments

`data`	a matrix or an array If a matrix the algorithm for univariate time series is used, where each row is a time series. If an array, the algorithm is run for multivariate time series. Each slice of the array is a matrix where the rows are the dimensions of the time series.
`n_iterations`	number of MCMC iterations.
`n_burnin`	number of iterations that must be excluded when computing the posterior estimate.
`params`	a list of parameters: If the time series is univariate the following must be specified: `a`,`b`,`c` parameters of the integrated likelihood. `phi` correlation parameter in the likelihood. If the time series is multivariate the following must be specified: `k_0`,`nu_0`,`S_0`,`m_0` parameters of the integrated likelihood. `phi` correlation parameter in the likelihood. If data are survival functions: `M` number of Monte Carlo iterations when computing the likelihood of the survival function. `xi` recovery rate fixed constant for each population at each time. `a0`,`b0` parameters for the computation of the integrated likelihood of the survival functions. `I0_var` variance for the Metropolis-Hastings estimation of the proportion of infected at time 0. `p` prior average number of change points for each order.
`alpha_SM`	`\alpha` for the split-merge main algorithm.
`B`	number of orders for the normalization constant.
`L`	number of split-merge steps for the proposal step.
`q`	probability of a split in the split-merge proposal and acceleration step.
`kernel`	can be "ts" if data are time series or "epi" if data are survival functions.
`print_progress`	If TRUE (default) print the progress bar.
`user_seed`	seed for random distribution generation.

Value

A ClustCpObj class object containing

$data vector or matrix with the data.
$n_iterations number of iterations.
$n_burnin number of burn-in iterations.
$clust a matrix where each row corresponds to the output cluster of the corresponding iteration.
$orders a multidimensional array where each slice is a matrix and represent an iteration. The row of each matrix correspond the order associated to the corresponding cluster.
$time computational time.
$lkl a matrix where each row is the likelihood of each observation computed at the corresponding iteration.
$norm_vec a vector containing the normalization constant computed at the beginning of the algorithm.
I0_MCMC traceplot for I_0.
I0_MCMC_01 a 0/1 vector, the n-th element is equal to 1 if the proposed I_0 was accepted, 0 otherwise.
$kernel_ts if TRUE data are time series.
$kernel_epi if TRUE data are survival function.
$univariate_ts TRUE if data is an univariate time series, FALSE if it is a multivariate time series.

References

Corradin, R., Danese, L., KhudaBukhsh, W. R., & Ongaro, A. (2024). Model-based clustering of time-dependent observations with common structural changes. arXiv preprint arXiv:2410.09552.

Examples



## Univariate time series

data_mat <- matrix(NA, nrow = 5, ncol = 100)

data_mat[1,] <- as.numeric(c(rnorm(50,0,0.100), rnorm(50,1,0.250)))
data_mat[2,] <- as.numeric(c(rnorm(50,0,0.125), rnorm(50,1,0.225)))
data_mat[3,] <- as.numeric(c(rnorm(50,0,0.175), rnorm(50,1,0.280)))
data_mat[4,] <- as.numeric(c(rnorm(25,0,0.135), rnorm(75,1,0.225)))
data_mat[5,] <- as.numeric(c(rnorm(25,0,0.155), rnorm(75,1,0.280)))

out <- clust_cp(data = data_mat, n_iterations = 5000, n_burnin = 1000,
                 L = 1, params = list(phi = 0.5), B = 1000, kernel = "ts")

print(out)

## Multivariate time series


data_array <- array(data = NA, dim = c(3,100,5))

data_array[1,,1] <- as.numeric(c(rnorm(50,0,0.100), rnorm(50,1,0.250)))
data_array[2,,1] <- as.numeric(c(rnorm(50,0,0.100), rnorm(50,1,0.250)))
data_array[3,,1] <- as.numeric(c(rnorm(50,0,0.100), rnorm(50,1,0.250)))

data_array[1,,2] <- as.numeric(c(rnorm(50,0,0.100), rnorm(50,1,0.250)))
data_array[2,,2] <- as.numeric(c(rnorm(50,0,0.100), rnorm(50,1,0.250)))
data_array[3,,2] <- as.numeric(c(rnorm(50,0,0.100), rnorm(50,1,0.250)))

data_array[1,,3] <- as.numeric(c(rnorm(50,0,0.175), rnorm(50,1,0.280)))
data_array[2,,3] <- as.numeric(c(rnorm(50,0,0.175), rnorm(50,1,0.280)))
data_array[3,,3] <- as.numeric(c(rnorm(50,0,0.175), rnorm(50,1,0.280)))

data_array[1,,4] <- as.numeric(c(rnorm(25,0,0.135), rnorm(75,1,0.225)))
data_array[2,,4] <- as.numeric(c(rnorm(25,0,0.135), rnorm(75,1,0.225)))
data_array[3,,4] <- as.numeric(c(rnorm(25,0,0.135), rnorm(75,1,0.225)))

data_array[1,,5] <- as.numeric(c(rnorm(25,0,0.155), rnorm(75,1,0.280)))
data_array[2,,5] <- as.numeric(c(rnorm(25,0,0.155), rnorm(75,1,0.280)))
data_array[3,,5] <- as.numeric(c(rnorm(25,0,0.155), rnorm(75,1,0.280)))

out <- clust_cp(data = data_array, n_iterations = 3000, n_burnin = 1000,
                params = list(phi = 0.5, k_0 = 0.25,
                              nu_0 = 5, S_0 = diag(0.1,3,3),
                              m_0 = rep(0,3)), B = 1000,  kernel = "ts")

print(out)

## Epidemiological data



data_mat <- matrix(NA, nrow = 5, ncol = 50)

betas <- list(c(rep(0.45, 25),rep(0.14,25)),
              c(rep(0.55, 25),rep(0.11,25)),
              c(rep(0.50, 25),rep(0.12,25)),
              c(rep(0.52, 10),rep(0.15,40)),
              c(rep(0.53, 10),rep(0.13,40)))

inf_times <- list()

for(i in 1:5){

  inf_times[[i]] <- sim_epi_data(10000, 10, 50, betas[[i]], 1/8)

  vec <- rep(0,50)
  names(vec) <- as.character(1:50)

  for(j in 1:50){
    if(as.character(j) %in% names(table(floor(inf_times[[i]])))){
      vec[j] = table(floor(inf_times[[i]]))[which(names(table(floor(inf_times[[i]]))) == j)]
    }
  }
  data_mat[i,] <- vec
}

out <- clust_cp(data = data_mat, n_iterations = 100, n_burnin = 10,
                params = list(M = 100, xi = 1/8), B = 1000, kernel = "epi")

print(out)

BayesChange documentation built on June 10, 2025, 9:09 a.m.