compute_optimal_encoding: Compute the optimal encoding for each state

Compute the optimal encoding for each state


Compute the optimal encoding for categorical functional data using an extension of the multiple correspondence analysis to a stochastic process.


  computeCI = TRUE,
  nBootstrap = 50,
  propBootstrap = 1,
  method = c("precompute", "parallel"),
  verbose = TRUE,
  nCores = max(1, ceiling(detectCores()/2)),



data.frame containing id, id of the trajectory, time, time at which a change occurs and state, associated state. All individuals must begin at the same time T0 and end at the same time Tmax (use cut_data).


basis created using the fda package (cf. create.basis).


if TRUE, perform a bootstrap to estimate the variance of encoding functions coefficients


number of bootstrap samples


size of bootstrap samples relative to the number of individuals: propBootstrap * number of individuals


computation method: "parallel" or "precompute": precompute all integrals (efficient when the number of unique time values is low)


if TRUE print some information


number of cores used for parallelization (only if method == "parallel"). Default is half the cores.


parameters for integrate function (see details).


See the vignette for the mathematical background: RShowDoc("cfda", package = "cfda")

Extra parameters (...) for the integrate function can be:

  • subdivisions the maximum number of subintervals.

  • rel.tol relative accuracy requested.

  • abs.tol absolute accuracy requested.


A list containing:

  • eigenvalues eigenvalues

  • alpha optimal encoding coefficients associated with each eigenvectors

  • pc principal components

  • F matrix containing the F_{(x,i)(y,j)}

  • V matrix containing the V_{(x,i)}

  • G covariance matrix of V

  • basisobj basisobj input parameter

  • pt output of estimate_pt function

  • bootstrap Only if computeCI = TRUE. Output of every bootstrap run

  • varAlpha Only if computeCI = TRUE. Variance of alpha parameters

  • runTime Total elapsed time


Cristian Preda, Quentin Grimonprez


  • Deville J.C. (1982) Analyse de données chronologiques qualitatives : comment analyser des calendriers ?, Annales de l'INSEE, No 45, p. 45-104.

  • Deville J.C. et Saporta G. (1980) Analyse harmonique qualitative, DIDAY et al. (editors), Data Analysis and Informatics, North Holland, p. 375-389.

  • Saporta G. (1981) Méthodes exploratoires d'analyse de données temporelles, Cahiers du B.U.R.O, Université Pierre et Marie Curie, 37-38, Paris.

  • Preda C, Grimonprez Q, Vandewalle V. Categorical Functional Data Analysis. The cfda R Package. Mathematics. 2021; 9(23):3074.

# Simulate the Jukes-Cantor model of nucleotide replacement
K <- 4
Tmax <- 5
PJK <- matrix(1 / 3, nrow = K, ncol = K) - diag(rep(1 / 3, K))
lambda_PJK <- c(1, 1, 1, 1)
d_JK <- generate_Markov(
  n = 10, K = K, P = PJK, lambda = lambda_PJK, Tmax = Tmax,
  labels = c("A", "C", "G", "T")
d_JK2 <- cut_data(d_JK, Tmax)

# create basis object
m <- 5
b <- create.bspline.basis(c(0, Tmax), nbasis = m, norder = 4)

# compute encoding
encoding <- compute_optimal_encoding(d_JK2, b, computeCI = FALSE, nCores = 1)

# plot the optimal encoding

# plot the two first components
plotComponent(encoding, comp = c(1, 2))

# extract the optimal encoding
get_encoding(encoding, harm = 1)

