clust_cp: Clustering time dependent observations with common change...

View source: R/clust_cp.R

clust_cpR Documentation

Clustering time dependent observations with common change points.

Description

The clust_cp function cluster observations with common change points. Data can be time series or survival functions.

Usage

clust_cp(
  data,
  n_iterations,
  n_burnin = 0,
  params = list(),
  alpha_SM = 1,
  B = 1000,
  L = 1,
  q = 0.5,
  kernel,
  print_progress = TRUE,
  user_seed = 1234
)

Arguments

data

a matrix or an array If a matrix the algorithm for univariate time series is used, where each row is a time series. If an array, the algorithm is run for multivariate time series. Each slice of the array is a matrix where the rows are the dimensions of the time series.

n_iterations

number of MCMC iterations.

n_burnin

number of iterations that must be excluded when computing the posterior estimate.

params

a list of parameters:

If the time series is univariate the following must be specified:

  • a,b,c parameters of the integrated likelihood.

  • phi correlation parameter in the likelihood.

If the time series is multivariate the following must be specified:

  • k_0,nu_0,S_0,m_0 parameters of the integrated likelihood.

  • phi correlation parameter in the likelihood.

If data are survival functions:

  • M number of Monte Carlo iterations when computing the likelihood of the survival function.

  • xi recovery rate fixed constant for each population at each time.

  • a0,b0 parameters for the computation of the integrated likelihood of the survival functions.

  • I0_var variance for the Metropolis-Hastings estimation of the proportion of infected at time 0.

  • p prior average number of change points for each order.

alpha_SM

\alpha for the split-merge main algorithm.

B

number of orders for the normalization constant.

L

number of split-merge steps for the proposal step.

q

probability of a split in the split-merge proposal and acceleration step.

kernel

can be "ts" if data are time series or "epi" if data are survival functions.

print_progress

If TRUE (default) print the progress bar.

user_seed

seed for random distribution generation.

Value

A ClustCpObj class object containing

  • $data vector or matrix with the data.

  • $n_iterations number of iterations.

  • $n_burnin number of burn-in iterations.

  • $clust a matrix where each row corresponds to the output cluster of the corresponding iteration.

  • $orders a multidimensional array where each slice is a matrix and represent an iteration. The row of each matrix correspond the order associated to the corresponding cluster.

  • $time computational time.

  • $lkl a matrix where each row is the likelihood of each observation computed at the corresponding iteration.

  • $norm_vec a vector containing the normalization constant computed at the beginning of the algorithm.

  • I0_MCMC traceplot for I_0.

  • I0_MCMC_01 a 0/1 vector, the n-th element is equal to 1 if the proposed I_0 was accepted, 0 otherwise.

  • $kernel_ts if TRUE data are time series.

  • $kernel_epi if TRUE data are survival function.

  • $univariate_ts TRUE if data is an univariate time series, FALSE if it is a multivariate time series.

References

Corradin, R., Danese, L., KhudaBukhsh, W. R., & Ongaro, A. (2024). Model-based clustering of time-dependent observations with common structural changes. arXiv preprint arXiv:2410.09552.

Examples



## Univariate time series

data_mat <- matrix(NA, nrow = 5, ncol = 100)

data_mat[1,] <- as.numeric(c(rnorm(50,0,0.100), rnorm(50,1,0.250)))
data_mat[2,] <- as.numeric(c(rnorm(50,0,0.125), rnorm(50,1,0.225)))
data_mat[3,] <- as.numeric(c(rnorm(50,0,0.175), rnorm(50,1,0.280)))
data_mat[4,] <- as.numeric(c(rnorm(25,0,0.135), rnorm(75,1,0.225)))
data_mat[5,] <- as.numeric(c(rnorm(25,0,0.155), rnorm(75,1,0.280)))

out <- clust_cp(data = data_mat, n_iterations = 5000, n_burnin = 1000,
                 L = 1, params = list(phi = 0.5), B = 1000, kernel = "ts")

print(out)

## Multivariate time series


data_array <- array(data = NA, dim = c(3,100,5))

data_array[1,,1] <- as.numeric(c(rnorm(50,0,0.100), rnorm(50,1,0.250)))
data_array[2,,1] <- as.numeric(c(rnorm(50,0,0.100), rnorm(50,1,0.250)))
data_array[3,,1] <- as.numeric(c(rnorm(50,0,0.100), rnorm(50,1,0.250)))

data_array[1,,2] <- as.numeric(c(rnorm(50,0,0.100), rnorm(50,1,0.250)))
data_array[2,,2] <- as.numeric(c(rnorm(50,0,0.100), rnorm(50,1,0.250)))
data_array[3,,2] <- as.numeric(c(rnorm(50,0,0.100), rnorm(50,1,0.250)))

data_array[1,,3] <- as.numeric(c(rnorm(50,0,0.175), rnorm(50,1,0.280)))
data_array[2,,3] <- as.numeric(c(rnorm(50,0,0.175), rnorm(50,1,0.280)))
data_array[3,,3] <- as.numeric(c(rnorm(50,0,0.175), rnorm(50,1,0.280)))

data_array[1,,4] <- as.numeric(c(rnorm(25,0,0.135), rnorm(75,1,0.225)))
data_array[2,,4] <- as.numeric(c(rnorm(25,0,0.135), rnorm(75,1,0.225)))
data_array[3,,4] <- as.numeric(c(rnorm(25,0,0.135), rnorm(75,1,0.225)))

data_array[1,,5] <- as.numeric(c(rnorm(25,0,0.155), rnorm(75,1,0.280)))
data_array[2,,5] <- as.numeric(c(rnorm(25,0,0.155), rnorm(75,1,0.280)))
data_array[3,,5] <- as.numeric(c(rnorm(25,0,0.155), rnorm(75,1,0.280)))

out <- clust_cp(data = data_array, n_iterations = 3000, n_burnin = 1000,
                params = list(phi = 0.5, k_0 = 0.25,
                              nu_0 = 5, S_0 = diag(0.1,3,3),
                              m_0 = rep(0,3)), B = 1000,  kernel = "ts")

print(out)

## Epidemiological data



data_mat <- matrix(NA, nrow = 5, ncol = 50)

betas <- list(c(rep(0.45, 25),rep(0.14,25)),
              c(rep(0.55, 25),rep(0.11,25)),
              c(rep(0.50, 25),rep(0.12,25)),
              c(rep(0.52, 10),rep(0.15,40)),
              c(rep(0.53, 10),rep(0.13,40)))

inf_times <- list()

for(i in 1:5){

  inf_times[[i]] <- sim_epi_data(10000, 10, 50, betas[[i]], 1/8)

  vec <- rep(0,50)
  names(vec) <- as.character(1:50)

  for(j in 1:50){
    if(as.character(j) %in% names(table(floor(inf_times[[i]])))){
      vec[j] = table(floor(inf_times[[i]]))[which(names(table(floor(inf_times[[i]]))) == j)]
    }
  }
  data_mat[i,] <- vec
}

out <- clust_cp(data = data_mat, n_iterations = 100, n_burnin = 10,
                params = list(M = 100, xi = 1/8), B = 1000, kernel = "epi")

print(out)



BayesChange documentation built on June 10, 2025, 9:09 a.m.