struct_em | R Documentation |
This function learns the structure and the parameters of a Gaussian mixture graphical model with incomplete data using the structural EM algorithm. At each iteration, the parametric EM algorithm is performed to complete the data and update the parameters (E step). The completed data are then used to update the structure (M step), and so on. Each iteration is guaranteed to increase the scoring function until convergence to a local maximum (Koller and Friedman, 2009). In practice, due to the sampling process inherent in particle-based inference, it may happen that the monotonic increase no longer occurs when approaching the local maximum, resulting in an earlier termination of the algorithm.
struct_em( gmgm, data, nodes = structure(gmgm)$nodes, arcs_cand = tibble(lag = 0), col_seq = NULL, score = "bic", n_part = 1000, max_part_sim = 1e+06, min_ess = 1, max_iter_sem = 5, max_iter_pem = 5, verbose = FALSE, ... )
gmgm |
An object of class |
data |
A data frame containing the data used for learning. Its columns
must explicitly be named after nodes of |
nodes |
A character vector containing the nodes whose local conditional
models are learned (by default all the nodes of |
arcs_cand |
A data frame containing the candidate arcs for addition or
removal (by default all possible non-temporal arcs). The column |
col_seq |
A character vector containing the column names of |
score |
A character string ( |
n_part |
A positive integer corresponding to the number of particles
generated for each observation (if |
max_part_sim |
An integer greater than or equal to |
min_ess |
A numeric value in [0, 1] corresponding to the minimum ESS
(expressed as a proportion of |
max_iter_sem |
A non-negative integer corresponding to the maximum number of iterations. |
max_iter_pem |
A non-negative integer corresponding to the maximum number of iterations of the parametric EM algorithm. |
verbose |
A logical value indicating whether iterations in progress are displayed. |
... |
Additional arguments passed to function |
A list with elements:
gmgm |
The final |
data |
A data frame (tibble) containing the complete data used to learn
the final |
seq_score |
A numeric matrix containing the sequence of scores measured after the E and M steps of each iteration. |
Koller, D. and Friedman, N. (2009). Probabilistic Graphical Models: Principles and Techniques. The MIT Press.
param_em
, param_learn
,
struct_learn
set.seed(0) data(data_body) data_1 <- data_body data_1$GENDER[sample.int(2148, 430)] <- NA data_1$AGE[sample.int(2148, 430)] <- NA data_1$HEIGHT[sample.int(2148, 430)] <- NA data_1$WEIGHT[sample.int(2148, 430)] <- NA data_1$FAT[sample.int(2148, 430)] <- NA data_1$WAIST[sample.int(2148, 430)] <- NA data_1$GLYCO[sample.int(2148, 430)] <- NA gmbn_1 <- add_nodes(NULL, c("AGE", "FAT", "GENDER", "GLYCO", "HEIGHT", "WAIST", "WEIGHT")) arcs_cand_1 <- data.frame(from = c("AGE", "GENDER", "HEIGHT", "WEIGHT", NA, "AGE", "GENDER", "AGE", "FAT", "GENDER", "HEIGHT", "WEIGHT", "AGE", "GENDER", "HEIGHT"), to = c("FAT", "FAT", "FAT", "FAT", "GLYCO", "HEIGHT", "HEIGHT", "WAIST", "WAIST", "WAIST", "WAIST", "WAIST", "WEIGHT", "WEIGHT", "WEIGHT")) res_learn_1 <- struct_em(gmbn_1, data_1, arcs_cand = arcs_cand_1, verbose = TRUE, max_comp = 3) set.seed(0) data(data_air) data_2 <- data_air data_2$NO2[sample.int(7680, 1536)] <- NA data_2$O3[sample.int(7680, 1536)] <- NA data_2$TEMP[sample.int(7680, 1536)] <- NA data_2$WIND[sample.int(7680, 1536)] <- NA gmdbn_1 <- gmdbn(b_2 = add_nodes(NULL, c("NO2", "O3", "TEMP", "WIND")), b_13 = add_nodes(NULL, c("NO2", "O3", "TEMP", "WIND"))) arcs_cand_2 <- data.frame(from = c("NO2", "NO2", "NO2", "O3", "TEMP", "TEMP", "WIND", "WIND"), to = c("NO2", "O3", "O3", "O3", NA, NA, NA, NA), lag = c(1, 0, 1, 1, 0, 1, 0, 1)) res_learn_2 <- struct_em(gmdbn_1, data_2, arcs_cand = arcs_cand_2, col_seq = "DATE", verbose = TRUE, max_comp = 3)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.