Description Usage Arguments Value Author(s) References See Also Examples
Function to perform a F-fold cross-validation applied to clusterwise multiblock analyses. This function is usually applied to various numbers of clusters and of dimensions to select their optimal values.
1 2 3 |
Y |
a matrix or data frame containing the dependent variable(s) |
X |
a matrix or data frame containing the explanatory variables |
blo |
vector of the numbers of variables in each explanatory dataset |
option |
an option for the block weighting (by default, the first option is chosen): |
G |
an integer giving the number of clusters |
H |
an integer giving the number of dimensions of the component-based model |
FOLD |
an integer giving the number of folds of the F-Fold cross-validation procedure comprised between 2 and 10 (10 by default) |
INIT |
an integer giving the number of initializations required for the clusterwise analysis (20 by default) |
method |
an option for the multiblock method to be applied (by default, the first option is chosen): |
Gamma |
a numeric value of the regularization parameter for the multiblock regularized regression comprised between 0 and 1 (NULL by default). The value ( |
parallel.level |
Level of parallel computing, i.e. initializations are carried out simultaneously (high by default) |
A list containing the following components is returned:
call |
the matching call |
sqrmse.cal |
the squared Root Mean Squared Error from the F calibration datasets |
sqrmse.val |
the squared Root Mean Squared Error from the F prediction datasets |
Stephanie Bougeard (stephanie.bougeard@anses.fr)
Bougeard, S., Abdi, H., Saporta, G., Niang, N., Submitted, Clusterwise analysis for multiblock component methods.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | data(simdata.red)
Data.X <- simdata.red[c(1:8, 21:28), 1:10]
Data.Y <- simdata.red[c(1:8, 21:28), 11:13]
res1 <- list()
res2 <- list()
## Note that the options (INIT=2) and (parallel.level = "low") are chosen to quickly
## illustrate the function.
## For real data, instead choose (INIT=20) to avoid local optima and (parallel.level = "high")
## to improve the computing speed.
for (H in c(1:2)){
print(paste("H=", H, sep=""))
res1[[H]] <- cw.tenfold(Y = Data.Y, X = Data.X, blo = c(5, 5), option = "none", G = 1, H,
FOLD = 2, INIT = 2, method = "mbpls", Gamma = NULL, parallel.level = "low")
res2[[H]] <- cw.tenfold(Y = Data.Y, X = Data.X, blo = c(5, 5), option = "none", G = 2, H,
FOLD = 2, INIT = 2, method = "mbpls", Gamma = NULL, parallel.level = "low")
}
res1.cal <- unlist(lapply(1:2, function(x) mean(sqrt(res1[[x]]$sqrmse.cal), na.rm=TRUE)))
res1.val <- unlist(lapply(1:2, function(x) mean(sqrt(res1[[x]]$sqrmse.val), na.rm=TRUE)))
res2.cal <- unlist(lapply(1:2, function(x) mean(sqrt(res2[[x]]$sqrmse.cal), na.rm=TRUE)))
res2.val <- unlist(lapply(1:2, function(x) mean(sqrt(res2[[x]]$sqrmse.val), na.rm=TRUE)))
rmse.cal <- rbind(res1.cal, res2.cal)
rmse.val <- rbind(res1.val, res2.val)
rownames(rmse.cal) <- rownames(rmse.val) <- paste("G", 1:2, sep = "=")
colnames(rmse.cal) <- colnames(rmse.val) <- paste("H", 1:2, sep = "=")
par(mfrow=c(1,2))
matplot(t(rmse.cal), type = "o", ylab = "RMSE of calibration", xlab = "Model dimension (H)",
main = "Calibration", col = c("steelblue", "darkorange"), pch = c(0, 5), lwd = c(3, 3))
legend("center", inset = .05, legend = rownames(rmse.cal), pch = c(0, 5), lwd = c(3, 3),
col = c("steelblue", "darkorange"), horiz = TRUE, title = "Cluster number (G)")
matplot(t(rmse.val), type = "o", ylab = "RMSE of prediction", xlab = "Model dimension (H)",
main = "Prediction", col = c("steelblue", "darkorange"), pch = c(0, 5), lwd = c(3, 3))
legend("center", inset = .05, legend = rownames(rmse.val), pch = c(0, 5), lwd = c(3, 3),
col = c("steelblue", "darkorange"), horiz = TRUE, title = "Cluster number (G)")
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.