Description Usage Arguments Value Note Examples
View source: R/dist_match_trans_learn.R
This function performs distribution mapping based transfer learning (DMTL) regression for given target (primary) and source (secondary) datasets. The data available in the source domain are used to design an appropriate predictive model. The target features with unknown response values are transferred to the source domain via distribution matching and then the corresponding response values in the source domain are predicted using the aforementioned predictive model. The response values are then transferred to the original target space by applying distribution matching again. Hence, this function needs an unmatched pair of target datasets (features and response values) and a matched pair of source datasets.
1 2 3 4 5 6 7 8 9 10 11 12 |
target_set |
List containing the target datasets. A named list with
components |
source_set |
List containing the source datasets. A named list with
components |
use_density |
Flag for using kernel density as distribution estimate
instead of histogram counts. Defaults to |
pred_model |
String indicating the underlying predictive model. The currently available options are -
|
model_optimize |
Flag for model parameter tuning. If |
sample_size |
Sample size for estimating distributions of target and
source datasets. Defaults to |
random_seed |
Seed for random number generator (for reproducible
outcomes). Defaults to |
all_pred |
Flag for returning the prediction values in the source space.
If |
get_verbose |
Flag for displaying the progress when optimizing the
predictive model i.e., |
allow_parallel |
Flag for allowing parallel processing when performing
grid search i.e., |
If all_pred = FALSE
, a vector containing the final prediction values.
If all_pred = TRUE
, a named list with two components target
and source
i.e., predictions in the original target space and in source space,
respectively.
The datasets in target_set
(i.e., X
and y
) do not need to be
matched (i.e., have the same number of rows) since the response values are
used only to estimate distribution for mapping while the feature values are
used for both mapping and final prediction. In contrast, the datasets in
source_set
(i.e., X
and y
) must have matched samples.
It is recommended to normalize the two response values (y
) so that
they will be in the same range. If normalization is not performed, DMTL()
uses the range of target y
values as the prediction range.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | set.seed(8644)
## Generate two dataset with different underlying distributions...
x1 <- matrix(rnorm(3000, 0.3, 0.6), ncol = 3)
dimnames(x1) <- list(paste0("sample", 1:1000), paste0("f", 1:3))
y1 <- 0.3*x1[, 1] + 0.1*x1[, 2] - x1[, 3] + rnorm(1000, 0, 0.05)
x2 <- matrix(rnorm(3000, 0, 0.5), ncol = 3)
dimnames(x2) <- list(paste0("sample", 1:1000), paste0("f", 1:3))
y2 <- -0.2*x2[, 1] + 0.3*x2[, 2] - x2[, 3] + rnorm(1000, 0, 0.05)
## Model datasets using DMTL & compare with a baseline model...
library(DMTL)
target <- list(X = x1, y = y1)
source <- list(X = x2, y = y2)
y1_pred <- DMTL(target_set = target, source_set = source, pred_model = "RF")
y1_pred_bl <- RF_predict(x_train = x2, y_train = y2, x_test = x1)
print(performance(y1, y1_pred, measures = c("MSE", "PCC")))
print(performance(y1, y1_pred_bl, measures = c("MSE", "PCC")))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.