Nothing
#' @title Linked (D)GP emulator construction
#'
#' @description
#'
#' `r new_badge("updated")`
#'
#' This function constructs a linked (D)GP emulator for a model chain or network.
#'
#' @param struc the structure of the linked emulator, which can take one of two forms:
#' - `r lifecycle::badge("deprecated")` a list contains *L* (the number of layers in a systems of computer models) sub-lists,
#' each of which represents a layer and contains (D)GP emulators (represented by
#' instances of S3 class `gp` or `dgp`) of computer models. The sub-lists are placed in the list
#' in the same order of the specified computer model system's hierarchy. **This option is deprecated and will be removed in the next release.**
#' - `r new_badge("new")` a data frame that defines the connection structure between emulators in the linked system, with the following columns:
#' * `From_Emulator`: the ID of the emulator providing the output. This ID must match the `id` slot
#' in the corresponding emulator object (produced by [gp()] or [dgp()]) within `emulators` argument of [lgp()], or it should
#' be special value `"Global"`, indicating the global inputs to the model chain or network. The `id` slot
#' is either automatically generated by [gp()] or [dgp()], or can be manually specified via the `id` argument in these functions or set with the
#' [set_id()] function.
#' * `To_Emulator`: the ID of the emulator receiving the input, also matching the `id` slot in the
#' corresponding emulator object.
#' * `From_Output`: a single integer specifying the output dimension of the `From_Emulator` that is being connected to the
#' input dimension of the `To_Emulator` specified by `To_Input`. If `From_Emulator` is `"Global"`, then `From_Output`
#' indicates the dimension of the global input passed to the `To_Emulator`.
#' * `To_Input`: a single integer specifying the input dimension of the `To_Emulator` that is receiving the `From_Output` dimension
#' from the `From_Emulator`.
#'
#' Each row represents a single one-to-one connection between a specified output dimension of `From_Emulator`
#' and a corresponding input dimension of `To_Emulator`. If multiple connections are required between
#' two emulators, each connection should be specified in a separate row.
#'
#' **Note:** When using the data frame option for `struc`, the `emulators` argument must be provided.
#' @param emulators `r new_badge("new")` a list of emulator objects, each containing an `id` slot that uniquely identifies it within the
#' linked system. The `id` slot in each emulator object must match the `From_Emulator`/`To_Emulator` columns in `struc`.
#'
#' If the same emulator is used multiple times within the linked system, the list must contain distinct copies
#' of that emulator, each with a unique ID stored in their `id` slot. Use the [set_id()] function to produce copies with different IDs
#' to ensure each instance can be uniquely referenced.
#' @param B the number of imputations used for prediction. Increase the value to refine representation of
#' imputation uncertainty. If the system consists of only GP emulators, `B` is set to `1` automatically. Defaults to `10`.
#' @param activate `r new_badge("new")` a bool indicating whether the initialized linked emulator should be activated:
#' - If `activate = FALSE`, [lgp()] returns an inactive linked emulator, allowing inspection of its structure using [summary()].
#' - If `activate = TRUE`, [lgp()] returns an active linked emulator, ready for prediction and validation using [predict()] and [validate()], respectively.
#'
#' Defaults to `TRUE`. This argument is only applicable when `struc` is specified as a data frame.
#' @param verb `r new_badge("new")` a bool indicating if the trace information on linked (D)GP emulator construction should be printed during the function call.
#' Defaults to `TRUE`. This argument is only applicable when `struc` is specified as a data frame.
#' @param id an ID to be assigned to the linked (D)GP emulator. If an ID is not provided (i.e., `id = NULL`), a UUID
#' (Universally Unique Identifier) will be automatically generated and assigned to the emulator. Defaults to `NULL`.
#'
#' @return An S3 class named `lgp` that contains three slots:
#' * `id`: A number or character string assigned through the `id` argument.
#' * `constructor_obj`: a list of 'python' objects that stores the information of the constructed linked emulator.
#' * `emulator_obj`, a 'python' object that stores the information for predictions from the linked emulator.
#' * `specs`: a list that contains
#' 1. `seed`: the random seed generated to produce the imputations. This information is stored for reproducibility
#' when the linked (D)GP emulator (that was saved by [write()] with the light option `light = TRUE`) is loaded back
#' to R by [read()].
#' 2. `B`: the number of imputations used to generate the linked (D)GP emulator.
#'
#' `r new_badge("new")` If `struc` is a data frame, `specs` also includes:
#' 1. `metadata`: a data frame providing configuration details for each emulator in the linked system, with following columns:
#' - `Emulator`: the ID of an emulator.
#' - `Layer`: the layer in the linked system where the emulator is positioned. A lower `Layer` number indicates
#' a position closer to the input, with layer numbering increasing as you move away from the input.
#' - `Pos_in_Layer`: the position of the emulator within its layer. A lower `Pos_in_Layer` number
#' indicates a position higher up in that layer.
#' - `Total_Input_Dims`: the total number of input dimensions of the emulator.
#' - `Total_Output_Dims`: the total number of output dimensions of the emulator.
#' 2. `struc`: The linked system structure, as supplied by `struc`.
#'
#' The returned `lgp` object can be used by
#' * [predict()] for linked (D)GP predictions.
#' * [validate()] for OOS validation.
#' * [plot()] for validation plots.
#' * [summary()] to summarize the constructed linked (D)GP emulator.
#' * [write()] to save the linked (D)GP emulator to a `.pkl` file.
#'
#' @details See further examples and tutorials at <`r get_docs_url()`>.
#' @examples
#' \dontrun{
#'
#' # load the package and the Python env
#' library(dgpsi)
#'
#' # model 1
#' f1 <- function(x) {
#' (sin(7.5*x)+1)/2
#' }
#' # model 2
#' f2 <- function(x) {
#' 2/3*sin(2*(2*x - 1))+4/3*exp(-30*(2*(2*x-1))^2)-1/3
#' }
#' # linked model
#' f12 <- function(x) {
#' f2(f1(x))
#' }
#'
#' # training data for Model 1
#' X1 <- seq(0, 1, length = 9)
#' Y1 <- sapply(X1, f1)
#' # training data for Model 2
#' X2 <- seq(0, 1, length = 13)
#' Y2 <- sapply(X2, f2)
#'
#' # emulation of model 1
#' m1 <- gp(X1, Y1, name = "matern2.5", id = "emulator1")
#' # emulation of model 2
#' m2 <- dgp(X2, Y2, depth = 2, name = "matern2.5", id = "emulator2")
#'
# # specify linked emulator structure
#' struc <- data.frame(From_Emulator = c("Global", "emulator1"),
#' To_Emulator = c("emulator1", "emulator2"),
#' From_Output = c(1, 1),
#' To_Input = c(1, 1))
#' emulators <- list(m1, m2)
#'
#' # construct the linked emulator for visual inspection
#' m_link <- lgp(struc, emulators, activate = FALSE)
#'
#' # visual inspection
#' summary(m_link)
#'
#' # build the linked emulator for prediction
#' m_link <- lgp(struc, emulators, activate = TRUE)
#' test_x <- seq(0, 1, length = 300)
#' m_link <- predict(m_link, x = test_x)
#'
#' # OOS validation
#' validate_x <- sample(test_x, 20)
#' validate_y <- sapply(validate_x, f12)
#' plot(m_link, validate_x, validate_y, style = 2)
#'
#' # write and read the constructed linked emulator
#' write(m_link, 'linked_emulator')
#' m_link <- read('linked_emulator')
#' }
#' @md
#' @export
lgp <- function(struc, emulators = NULL, B = 10, activate = TRUE, verb = TRUE, id = NULL) {
if ( is.null(pkg.env$dgpsi) ) {
init_py(verb = F)
if (pkg.env$restart) return(invisible(NULL))
}
if ( is.data.frame(struc) ){
#if ( mode!='validate' & mode!='activate' ) stop("'mode' can only be either 'validate' or 'activate'.", call. = FALSE)
if ( verb ) message("Processing emulators ...", appendLF = FALSE)
struc <- validate_emulator_data(struc, emulators)
metadata <- infer_metadata_from_struc(struc)
metadata$Pos_in_Layer <- stats::ave(seq_along(metadata$Emulator), metadata$Layer, FUN = seq_along)
lgp_struc <- process_emulators(emulators, metadata)
metadata <- update_metadata(metadata, emulators)
if ( verb ) {
Sys.sleep(0.5)
message(" done")
}
} else {
lifecycle::deprecate_warn(
when = "2.5.0",
what = I("Providing `struc` in list form"),
details = c(
i = "Support for providing `struc` as a list will be removed in the next release.",
i = "To maintain compatibility with future versions, please update your code to use the new
data frame format of `struc` and pass a list of emulators to the new `emulators` argument."
)
)
lgp_struc <- struc
}
if ( verb & is.data.frame(struc) ) message("Linking and synchronizing emulators ...", appendLF = FALSE)
B <- as.integer(B)
L <- length(lgp_struc)
extracted_struc <- list()
for ( l in 1:L ) {
layer <- list()
K <- length(lgp_struc[[l]])
for (k in 1:K) {
cont <- pkg.env$copy$deepcopy(lgp_struc[[l]][[k]]$container_obj)
if (is.data.frame(struc)){
# check cont and restore link and input_dim in case old versions of emulators are used for the new option
if (cont$type=='gp'){
if ( !is.null(cont$structure$connect) ){
inverse_order <- order(c(cont$structure$input_dim, cont$structure$connect) + 1)
cont$structure$input <- cbind(cont$structure$input, cont$structure$global_input)[,inverse_order,drop=F]
cont$structure$global_input <- NULL
cont$structure$input_dim <- (1:length(inverse_order))-1
cont$structure$connect <- NULL
if (length(cont$structure$length)!=1) cont$structure$length <- cont$structure$length[inverse_order]
}
} else {
exist.connect <- FALSE
for (item in cont$structure[[1]]){
if ( !is.null(item$connect) ){
exist.connect <- TRUE
break
}
}
if ( exist.connect ){
for (item in cont$structure[[1]]){
if ( !is.null(item$connect) ){
inverse_order <- order(c(item$input_dim, item$connect) + 1)
item$input <- cbind(item$input, item$global_input)[,inverse_order,drop=F]
item$global_input <- NULL
item$input_dim <- (1:length(inverse_order))-1
item$connect <- NULL
if (length(item$length)!=1) item$length <- item$length[inverse_order]
}
}
}
}
emulator_id <- lgp_struc[[l]][[k]]$id
if ( l==1 ){
input_connections <- subset(struc, struc[["To_Emulator"]] == emulator_id)
global_inputs <- input_connections$To_Input
global_outputs <- input_connections$From_Output
other_emulator_inputs <- NULL
other_emulator_outputs <- NULL
} else {
input_connections <- subset(struc, struc[["To_Emulator"]] == emulator_id)
global_inputs <- input_connections$To_Input[input_connections$From_Emulator == "Global"]
global_outputs <- input_connections$From_Output[input_connections$From_Emulator == "Global"]
if ( length(global_inputs)==0 ) global_inputs <- NULL
if ( length(global_outputs)==0 ) global_outputs <- NULL
other_emulator_inputs <- c() # To store the corresponding input dimensions
other_emulator_outputs <- vector("list", l-1)
for ( i in seq_len(l-1) ){
emulators_in_layer <- lgp_struc[[i]]
col_start <- 1
column_indices <- c() # To store the ordered indices for the emulator in question
# Loop through each emulator in the layer
for ( j in 1:length(emulators_in_layer) ) {
emu_ij_id <- lgp_struc[[i]][[j]]$id
# check the number of output dimensions
num_outputs <- ncol(lgp_struc[[i]][[j]]$data$Y)
# Determine the column range in pooled matrix for the current emulator
col_end <- col_start + num_outputs - 1
# Check if this emulator feeds into the emulator in question
if (emu_ij_id %in% input_connections$From_Emulator) {
# Identify specific output dimensions that feed into the target emulator and map to columns in pooled matrix
relevant_rows <- input_connections$From_Emulator == emu_ij_id
relevant_outputs <- input_connections$From_Output[relevant_rows]
relevant_input_dims <- input_connections$To_Input[relevant_rows]
# Convert these output dimensions to corresponding column indices in Y
relevant_columns <- col_start + relevant_outputs - 1
column_indices <- c(column_indices, relevant_columns)
other_emulator_inputs <- c(other_emulator_inputs, relevant_input_dims)
}
# Update col_start for the next emulator
col_start <- col_end + 1
}
other_emulator_outputs[[i]] <- column_indices
}
}
if (cont$type=='gp'){
if (l==1){
cont$structure$input_dim <- reticulate::np_array(as.integer(global_inputs - 1))
cont$structure$input <- cont$structure$input[,global_inputs,drop=F]
if (length(cont$structure$length)!=1) cont$structure$length <- cont$structure$length[global_inputs]
idx_py <- linked_idx_r_to_py(global_outputs)
cont$set_local_input(idx_py)
} else {
#if ( is.null(other_emulator_inputs) ){
# cont$structure$input_dim <- reticulate::np_array(as.integer(global_inputs - 1))
# cont$structure$input <- cont$structure$input[,global_inputs,drop=F]
# if (length(cont$structure$length)!=1) cont$structure$length <- cont$structure$length[global_inputs]
# idx_py <- linked_idx_r_to_py(other_emulator_inputs)
# cont$set_local_input(idx_py)
#} else {
cont$structure$input_dim <- reticulate::np_array(as.integer(other_emulator_inputs - 1))
if ( !is.null(global_inputs) ) {
cont$structure$connect <- reticulate::np_array(as.integer(global_inputs - 1))
cont$structure$global_input <- cont$structure$input[,global_inputs,drop=F]
}
cont$structure$input <- cont$structure$input[,other_emulator_inputs,drop=F]
if (length(cont$structure$length)!=1) cont$structure$length <- cont$structure$length[c(other_emulator_inputs, global_inputs)]
idx_py <- linked_idx_r_to_py(other_emulator_outputs)
cont$set_local_input(idx_py)
#}
}
#DGP
} else {
if (l==1){
for ( i in 1:length(cont$structure) ) {
for ( item in cont$structure[[i]] ){
if (i==1){
item$input_dim <- reticulate::np_array(as.integer(global_inputs - 1))
item$input <- item$input[,global_inputs,drop=F]
if (length(item$length)!=1) item$length <- item$length[global_inputs]
} else {
if ( !is.null(item$connect) ) item$connect <- reticulate::np_array(as.integer(reorder_connect(item$connect + 1, global_inputs) - 1))
}
}
}
idx_py <- linked_idx_r_to_py(global_outputs)
cont$set_local_input(idx_py)
} else{
# if ( is.null(other_emulator_inputs) ){
# for ( i in 1:length(cont$structure) ) {
# for ( item in cont$structure[[i]] ){
# if (i==1){
# item$input_dim <- reticulate::np_array(as.integer(global_inputs - 1))
# item$input <- item$input[,global_inputs,drop=F]
# if (length(item$length)!=1) item$length <- item$length[global_inputs]
# } else {
# if ( !is.null(item$connect) ) item$connect <- reticulate::np_array(as.integer(reorder_connect(item$connect + 1, global_inputs) - 1))
# }
# }
# }
# idx_py <- linked_idx_r_to_py(other_emulator_inputs)
# cont$set_local_input(idx_py)
# } else {
for ( i in 1:length(cont$structure) ) {
for ( item in cont$structure[[i]] ){
if (i==1){
item$input_dim <- reticulate::np_array(as.integer(other_emulator_inputs - 1))
if ( !is.null(global_inputs) ) {
item$connect <- reticulate::np_array(as.integer(global_inputs - 1))
item$global_input <- item$input[,global_inputs,drop=F]
}
item$input <- item$input[,other_emulator_inputs,drop=F]
if (length(item$length)!=1) item$length <- item$length[c(other_emulator_inputs, global_inputs)]
} else {
if ( !is.null(item$connect) ) item$connect <- reticulate::np_array(as.integer(reorder_connect(item$connect + 1, c(other_emulator_inputs, global_inputs)) - 1))
}
}
}
idx_py <- linked_idx_r_to_py(other_emulator_outputs)
cont$set_local_input(idx_py)
# }
}
}
# to be deprecated
} else {
if ( is.null(cont$local_input_idx) ){
stop(sprintf("Emulator %i in Layer %i has no 'linked_idx' specified. Use set_linked_idx() to specify this attribute.", k, l), call. = FALSE)
}
if ( l==1 ){
if (cont$type=='gp'){
if ( !is.null(cont$structure$connect) ){
inverse_order <- order(c(cont$structure$input_dim, cont$structure$connect) + 1)
cont$structure$input <- cbind(cont$structure$input, cont$structure$global_input)[,inverse_order,drop=F]
cont$structure$global_input <- NULL
cont$structure$input_dim <- (1:length(inverse_order))-1
cont$structure$connect <- NULL
if (length(cont$structure$length)!=1) cont$structure$length <- cont$structure$length[inverse_order]
}
} else {
exist.connect <- FALSE
for (item in cont$structure[[1]]){
if ( !is.null(item$connect) ){
exist.connect <- TRUE
break
}
}
if ( exist.connect ){
for (item in cont$structure[[1]]){
if ( !is.null(item$connect) ){
inverse_order <- order(c(item$input_dim, item$connect) + 1)
item$input <- cbind(item$input, item$global_input)[,inverse_order,drop=F]
item$global_input <- NULL
item$input_dim <- (1:length(inverse_order))-1
item$connect <- NULL
if (length(item$length)!=1) item$length <- item$length[inverse_order]
}
}
}
}
}
}
layer[[k]] <- cont
}
extracted_struc[[l]] <- layer
}
if ( verb & is.data.frame(struc) ) {
Sys.sleep(0.1)
message(" done")
}
if ( is.data.frame(struc) ){
#if ( mode == "validate" ) {
# if ( verb ) message("Validating the linked emulator ...", appendLF = FALSE)
#} else if ( mode == "activate" ){
if ( activate ){
if ( verb ) message("Activating the linked emulator ...", appendLF = FALSE)
}
res <- list(constructor_obj = extracted_struc)
res[['id']] <- if (is.null(id)) uuid::UUIDgenerate() else id
if ( activate ) {
seed <- sample.int(100000, 1)
set_seed(seed)
obj <- pkg.env$dgpsi$lgp(all_layer = extracted_struc, N = B)
res[['emulator_obj']] <- obj
res[['specs']][['seed']] <- seed
res[['specs']][['B']] <- B
}
res[['specs']][['struc']] <- struc
res[['specs']][['metadata']] <- metadata
if ( activate ){
if ( verb ) {
Sys.sleep(0.1)
message(" done")
}
}
} else {
res <- list(constructor_obj = extracted_struc)
res[['id']] <- if (is.null(id)) uuid::UUIDgenerate() else id
seed <- sample.int(100000, 1)
set_seed(seed)
obj <- pkg.env$dgpsi$lgp(all_layer = extracted_struc, N = B)
res[['emulator_obj']] <- obj
res[['specs']][['seed']] <- seed
res[['specs']][['B']] <- B
}
class(res) <- "lgp"
return(res)
}
validate_emulator_data <- function(struc, emulators) {
if (is.null(emulators)) {
stop("When `struc` is a data frame, `emulators` must be supplied and not NULL.", call. = FALSE)
}
# Ensure necessary columns are present in struc
required_struc_cols <- c("From_Emulator", "To_Emulator", "From_Output", "To_Input")
if ( !all(tolower(required_struc_cols) %in% tolower(names(struc))) ) {
stop('`struc` must contain the columns: "From_Emulator", "To_Emulator", "From_Output", and "To_Input".', call. = FALSE)
}
matched_cols <- match(tolower(required_struc_cols), tolower(names(struc)))
names(struc)[matched_cols] <- required_struc_cols
struc$From_Emulator[tolower(struc$From_Emulator) == "global"] <- "Global"
# Check that all emulators in struc (From_Emulator and To_Emulator) exist in metadata, ignoring "Global"
struc_emulators <- unique(c(struc$From_Emulator, struc$To_Emulator))
struc_emulators <- setdiff(struc_emulators, "Global") # Remove "Global" from the list
# Check that all emulators in metadata have a corresponding object in emulators
emulator_ids <- sapply(emulators, function(e) e$id)
missing_emulators <- setdiff(struc_emulators, emulator_ids)
if (length(missing_emulators) > 0) {
stop("The following emulators in `struc` are missing from `emulators`: ", paste(missing_emulators, collapse = ", "), call. = FALSE)
}
for (emulator in emulators) {
expected_dims <- ncol(emulator$data$X)
emulator_id <- emulator$id
# Filter struc for connections to this emulator
emulator_inputs <- struc$To_Input[struc$To_Emulator == emulator_id]
# Check that we have the expected number of unique input dimensions
if (length(emulator_inputs) < expected_dims) {
stop(sprintf("Emulator '%s' has incomplete input dimensions specified in `struc`. Expected %d, found %d.",
emulator_id, expected_dims, length(emulator_inputs)), call. = FALSE)
} else if ( length(emulator_inputs) > expected_dims ) {
stop(sprintf("Emulator '%s' has duplicate specifications for input dimensions in `struc`. Each input dimension must be specified only once.", emulator_id), call. = FALSE)
}
}
return(struc)
}
process_emulators <- function(emulators, metadata) {
# Order metadata by Layer and by Pos_in_Layer
metadata <- metadata[order(metadata$Layer, metadata$Pos_in_Layer), ]
# Determine the number of layers
num_layers <- max(metadata$Layer)
# Initialize structured_emulators list
structured_emulators <- vector("list", num_layers)
# Populate each layer's sub-list with emulator objects
for (layer in seq_len(num_layers)) {
# Get metadata for the current layer
layer_metadata <- subset(metadata, metadata[["Layer"]] == layer)
# Get emulators for the current layer, ordered by Pos_in_Layer if it exists
layer_emulators <- lapply(layer_metadata$Emulator, function(emulator_id) {
# Find the matching emulator in the emulators list by id
matching_emulator <- emulators[[which(sapply(emulators, function(e) e$id) == emulator_id)]]
return(matching_emulator)
})
# Assign the ordered list of emulators to the current layer's sub-list
structured_emulators[[layer]] <- layer_emulators
}
return(structured_emulators)
}
infer_metadata_from_struc <- function(struc) {
layers <- list()
candidate_first_layer <- struc$To_Emulator[struc$From_Emulator == "Global"]
first_layer_emulators <- unique(candidate_first_layer[!candidate_first_layer %in% unique(struc$To_Emulator[struc$From_Emulator != "Global"])])
layers[[1]] <- first_layer_emulators
processed_emulators <- first_layer_emulators
# Iteratively find subsequent layers
layer_num <- 2
while (TRUE) {
last_layer_emulators <- layers[[layer_num - 1]]
next_layer_candidates <- struc$To_Emulator[struc$From_Emulator %in% last_layer_emulators]
next_layer_emulators <- unique(next_layer_candidates[!next_layer_candidates %in% unique(struc$To_Emulator[!struc$From_Emulator %in% c(processed_emulators, "Global")])])
# Break if no more emulators can be assigned to new layers (all have been processed)
if (length(next_layer_emulators) == 0) break
# Add the validated next layer emulators to layers
layers[[layer_num]] <- next_layer_emulators
# Update processed emulators
processed_emulators <- c(processed_emulators, next_layer_emulators)
# Move to the next layer
layer_num <- layer_num + 1
}
metadata <- do.call(rbind, lapply(seq_along(layers), function(layer) {
data.frame(Emulator = layers[[layer]], Layer = layer, stringsAsFactors = FALSE)
}))
# Ensure metadata is sorted by Layer and return
metadata <- metadata[order(metadata$Layer), ]
rownames(metadata) <- NULL # Reset row names
return(metadata)
}
reorder_connect <- function(connect, ord) {
ord_inv <- integer(length(ord))
ord_inv[ord] <- seq_along(ord)
connect_ord <- ord_inv[connect]
return(connect_ord)
}
update_metadata <- function(metadata, emulators) {
io_dims <- t(sapply(metadata$Emulator, function(emulator_id) {
emulator <- emulators[[which(sapply(emulators, function(e) e$id) == emulator_id)]]
c(Total_Input_Dims = ncol(emulator$data$X),
Total_Output_Dims = ncol(emulator$data$Y),
Type = class(emulator),
Vecchia = emulator$specs$vecchia
)# Directly return both as scalars
}))
# Convert to a data frame and combine with metadata
metadata <- cbind(metadata, as.data.frame(io_dims))
row.names(metadata) <- NULL
return(metadata)
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.