smtl | R Documentation |
smtl: make model-fitting function
smtl( y, X, study = NA, s, commonSupp = FALSE, warmStart = TRUE, lambda_1 = 0, lambda_2 = 0, lambda_z = 0, scale = TRUE, maxIter = 10000, LocSrch_maxIter = 50, messageInd = TRUE, model = TRUE, independent.regs = FALSE )
y |
A numeric outcome vector (for multi-task/domain generalization problems) or a numeric outcome matrix (for multi-label problems) |
X |
A matrix of covariates |
study |
A vector of integers specifying task (or study/domain) ID. This should be set to NA for Multi-Label problems, but is required for Multi-Task and Domain Generalization problems. |
s |
An integer specifying the sparsity level |
commonSupp |
A boolean specifying whether to constrain solutions to have a common support |
warmStart |
A boolean specifying whether a warm start model is fit internally before the final model. Warm starts improve solution quality but will be slower. |
lambda_1 |
A numeric vector of ridge penalty hyperparameter values |
lambda_2 |
A numeric vector of betaBar (to borrow strength across coefficient values) penalty hperparameter values |
lambda_z |
A numeric vector zBar (to borrow strength across coefficient supports) penalty hperparameter values |
scale |
A boolean specifying whether to center and scale covariates before model fitting (either way coefficient estimates are returned on original scale before centering/scaling) |
maxIter |
An integer specifying the maximum number of coordinate descent iterations before |
LocSrch_maxIter |
An integer specifying the number of maximum local search iterations |
messageInd |
A boolean specifying whether to include messages (verbose) |
model |
A boolean indicating whether to return design matrix and outcome vector |
independent.regs |
A boolean specifying whether to fit independent regressions (instead of multi-task). This ensures there is NO information sharing via active sets or penalties |
A list (object of S3 class).
beta |
Matrix with coefficient estimates where column j are estimates from task j. |
reg_type |
String specifying whether model is |
K |
Integer that indicates number of tasks. |
s |
An integer that indicates sparsity level. |
commonSupp |
Boolean indicating of supports are common across tasks. |
warmStart |
A Boolean indicating whether to fit a MTL model as a warm start. |
grid |
A dataframe including grid of hyperparameters that model is fit on. |
maxIter |
An integer specifying the maximum number of iterations of block CD. |
LocSrch_maxIter |
An integer specify the maximum number of iterations of local search. |
independent.regs |
A boolean indicating whether to make each task independent of each other (no shared active sets). |
AS_multiplier |
An integer specifying the active set multiplier. |
X_train |
A Matrix: the design matrix (row concatenated across tasks). |
y_train |
The outcome vector or matrix. |
## Not run: if (identical(Sys.getenv("AUTO_JULIA_INSTALL"), "true")) { ## The examples are quite time consuming ## Do initiation for and automatic installation if necessary # load package library(sMTL) smtl_setup() ##################################################################################### ##### simulate data ##################################################################################### set.seed(1) # fix the seed to get a reproducible result K <- 4 # number of datasets p <- 100 # covariate dimension s <- 5 # support size q <- 7 # size of subset of covariates that can be non-zero for any task n_k <- 50 # task sample size N <- n_k * p # full dataset samplesize X <- matrix( rnorm(N * p), nrow = N, ncol=p) # full design matrix B <- matrix(1 + rnorm(K * (p+1) ), nrow = p + 1, ncol = K) # betas before making sparse Z <- matrix(0, nrow = p, ncol = K) # matrix of supports y <- vector(length = N) # outcome vector # randomly sample support to make betas sparse for(j in 1:K) Z[1:q, j] <- sample( c( rep(1,s), rep(0, q - s) ), q, replace = FALSE ) B[-1,] <- B[-1,] * Z # make betas sparse and ensure all models have an intercept task <- rep(1:K, each = n_k) # vector of task labels (indices) # iterate through and make each task specific dataset for(j in 1:K){ indx <- which(task == j) # indices of task e <- rnorm(n_k) y[indx] <- B[1, j] + X[indx,] %*% B[-1,j] + e } colnames(B) <- paste0("beta_", 1:K) rownames(B) <- paste0("X_", 1:(p+1)) print("Betas") print(round(B[1:8,],2)) ##################################################################################### ##### fit Multi-Task Learning Model for Heterogeneous Support ##################################################################################### mod <- sMTL::smtl(y = y, X = X, study = task, s = 5, commonSupp = FALSE, lambda_1 = 0.001, lambda_2 = 0, lambda_z = 0.25) print(round(mod$beta[1:8,],2)) # make predictions preds <- sMTL::predict(model = mod, X = X[1:5,]) ##################################################################################### ##### fit Multi-Task Learning Model for Common Support ##################################################################################### library(sMTL) sMTL::smtl_setup(path = "/Applications/Julia-1.5.app/Contents/Resources/julia/bin") mod <- sMTL::smtl(y = y, X = X, study = task, s = 5, commonSupp = TRUE, lambda_1 = 0.001, lambda_2 = 0.5) print(round(mod$beta[1:8,],2)) } ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.