f_control_mactivate: Set Fitting Hyperparameters
In mactivate: Multiplicative Activation

Description Usage Arguments Details Value See Also Examples

Allows user a single function to tune the mactivate fitting algorithms, f_fit_gradient_01, f_fit_hybrid_01, f_fit_gradient_logistic_01.

f_control_mactivate(
param_sensitivity = 10^9, 
bool_free_w = FALSE, 
w0_seed = 0.1, 
max_internal_iter = 500, 
w_col_search = "one", 
bool_headStart = FALSE, 
antifreeze = FALSE, 
ss_stop = 10^(-8), 
escape_rate = 1.004, 
step_size = 1/100, 
Wadj = 1/1, 
force_tries = 0, 
lambda = 0, 
tol = 10^(-8))

`param_sensitivity`	Large positive scalar numeric.
`bool_free_w`	Scalar logical. Allow values of `W` to wander outside [0,1]?
`w0_seed`	Scalar numeric. Usually in [0,1]. Initial value(s) for multiplicative activation layer, `W`.
`max_internal_iter`	Scalar non-negative integer. Hybrid only. How many activation descent passes to make before refitting primary effects.
`w_col_search`	Scalar character. When `one`, locating `W` and corresponding coefficients is done (progressively) one column at a time; when `all`, locating `W` and corresponding coefficients is done for current column and all previous columns; When `alternate`, locating `W` and corresponding coefficients is done (progressively) one column at a time, however, after each column is fitted, an additonal pass is made fitting current column and all previous columns.
`bool_headStart`	Scalar logical. Gradient only. When `TRUE`, fitting first locates initial primary effects as a “head start” to the subsequent gradient fitting.
`antifreeze`	Scalar logical. Hybrid only. New w/v0.6.5. When `FALSE`, backwards compatible. When `TRUE`, prevents hanging (non-convergence) that may rarely occur when input space is highly correlated.
`ss_stop`	Small positive scalar numeric. Convergence tolerance.
`escape_rate`	Scalar numeric no less than one and likely no greater than, say, 1.01. Affinity for exiting a column search over `W`. E.g., if 1, fitting may take a long time. If 1.01, search for each column `W` will terminate relatively quickly.
`step_size`	Positive scalar numeric. Initial gradient step size (in both gradient and hybrid fitting algorithms) for all parameters.
`Wadj`	Positive scalar numeric. Control gradient step size (in both gradient and hybrid fitting algorithms) of `W`.
`force_tries`	Scalar non-negative integer. Force a minimum number of fitting recursions.
`lambda`	Scalar numeric. Ridge regularizer. The actual diagonal loading imposed upon the precision matrix is equal to `lambda` times its original diagonal. A value of `0` applies no loading; a value of `1` doubles the diagonal values of the precision matrix. This is applied to primary effects only. With gradient MLR fitting, i.e., `f_fit_gradient_01`, this only applies when arg `bool_headStart` is set to `TRUE` (otherwise there'd be nothing to regularize). With hybrid MLR fitting, i.e., `f_fit_hybrid_01`, this regularization is applied at each LS step (see About vignette). With logistic fitting, this arg does nothing. Note that with logistic fitting, we can always add a small amount of white noise to `X`.
`tol`	Small positive scalar numeric. Hybrid only. Similar to arg `ss_stop` above, but controls convergence tolerance after both recursions in hybrid fitting have completed.

Fitting a mactivate model to data can/will be dramatically affected by these tuning hyperparameters. On one extreme, one set of hyperparameters may result in the fitting algorithm fruitlessly exiting almost immediately. Another set of hyperparameters may send the fitting algorithm to run and run for hours. While an ideal hyperparameterization will expeditiously fit the data.

Named list to be passed to mact_control arg in fitting functions.

f_fit_gradient_01, f_fit_hybrid_01, f_fit_gradient_logistic_01.

library(mactivate)

set.seed(777)

d <- 20
N <- 50000

X <- matrix(rnorm(N*d, 0, 1), N, d)

colnames(X) <- paste0("x", I(1:d))

############# primary effect slopes
b <- rep_len( c(-1, 1), d )


ystar <-
X %*% b +
1 * (X[ , 1]) * (X[ , 2]) * (X[ , 3]) -
1 * (X[ , 2]) * (X[ , 3]) * (X[ , 4]) * (X[ , 5])

Xall <- X

errs <- rnorm(N, 0, 1)
errs <- 3 * (errs - mean(errs)) / sd(errs)

sd(errs)

y <- ystar + errs ### response

yall <- y
Nall <- N



############# hybrid example


### this control setting will exit too quickly
### compare this with example below

xcmact <-
f_control_mactivate(
param_sensitivity = 10^5,
w0_seed           = 0.1,
max_internal_iter = 500,
w_col_search      = "one",
ss_stop           = 10^(-5),
escape_rate       = 1.01,
Wadj              = 1/1,
lambda            = 1/1000,
tol               = 10^(-5)
)


m_tot <- 4

Uall <- Xall

xxnow <- Sys.time()

xxls_out <-
f_fit_hybrid_01(
X = Xall,
y = yall,
m_tot = m_tot,
U = Uall,
m_start = 1,
mact_control = xcmact,
verbosity = 1
)

cat( difftime(Sys.time(), xxnow, units="mins"), "\n" )

yhatG <- predict(object=xxls_out, X0=Xall, U0=Uall, mcols=m_tot )

sqrt( mean( (yall  -  yhatG)^2 ) )





####################### this control setting should fit
####################### (will take a few minutes)

xcmact <-
f_control_mactivate(
param_sensitivity = 10^10, ### make more sensitive
w0_seed           = 0.1,
max_internal_iter = 500,
w_col_search      = "one",
ss_stop           = 10^(-14), ### make stopping insensitive
escape_rate       = 1.001, #### discourage quitting descent
Wadj              = 1/1,
lambda            = 1/10000,
tol               = 10^(-14) ### make tolerance very small
)


m_tot <- 4

Uall <- Xall

xxnow <- Sys.time()

xxls_out <-
f_fit_hybrid_01(
X = Xall,
y = yall,
m_tot = m_tot,
U = Uall,
m_start = 1,
mact_control = xcmact,
verbosity = 1
)

cat( difftime(Sys.time(), xxnow, units="mins"), "\n" )

yhatG <- predict(object=xxls_out, X0=Xall, U0=Uall, mcols=m_tot )

sqrt( mean( (yall  -  yhatG)^2 ) )


xxls_out

Xstar <- f_mactivate(U=Uall, W=xxls_out[[ m_tot+1 ]][[ "What" ]])
colnames(Xstar) <- paste0("xstar_", seq(1, m_tot))
Xall <- cbind(Xall, Xstar)

xlm <- lm(yall~Xall)
summary(xlm)