gmjmcmc: Main algorithm for GMJMCMC (Genetically Modified MJMCMC)

gmjmcmcR Documentation

Main algorithm for GMJMCMC (Genetically Modified MJMCMC)

Description

Main algorithm for GMJMCMC (Genetically Modified MJMCMC)

Usage

gmjmcmc(
  x,
  y,
  transforms,
  P = 10,
  N = 100,
  N.final = NULL,
  probs = NULL,
  params = NULL,
  loglik.pi = NULL,
  loglik.alpha = gaussian.loglik.alpha,
  mlpost_params = list(family = "gaussian", beta_prior = list(type = "g-prior")),
  intercept = TRUE,
  fixed = 0,
  sub = FALSE,
  verbose = TRUE
)

Arguments

x

matrix containing the design matrix with data to use in the algorithm

y

response variable

transforms

A character vector including the names of the non-linear functions to be used by the modification and the projection operator.

P

The number of population iterations for GMJMCMC. The default value is P = 10, which was used in our initial example for illustrative purposes. However, a larger value, such as P = 50, is typically more appropriate for most practical applications.

N

The number of MJMCMC iterations per population. The default value is N = 100; however, for real applications, a larger value such as N = 1000 or higher is often preferable.

N.final

The number of MJMCMC iterations performed for the final population. Per default one has N.final = N, but for practical applications, a much larger value (e.g., N.final = 1000) is recommended. Increasing N.final is particularly important if predictions and inferences are based solely on the last population.

probs

A list of various probability vectors used by GMJMCMC, generated by gen.probs.gmjmcmc. The key component probs.gen defines probabilities of different operators in the feature generation process. Defaults typically favor interactions and modifications (0.4 each) over projections and mutations (0.1 each) to encourage interpretable nonlinear features.

params

A list of various parameter vectors used by GMJMCMC, generated by gen.params.gmjmcmc.

loglik.pi

A function specifying the marginal log-posterior of the model up to a constant, including the logarithm of the model prior: \log p(M|Y) = \text{const} + \log p(Y|M) + \log p(M). Typically assumes a Gaussian model with Zellner's with g = max(n,p^2) by default.

loglik.alpha

Relevant only if the non-linear projection features depend on parameters \alpha. If \alpha is estimated, this argument specifies the corresponding marginal log-likelihood. The default method sets all \alpha to 1 (fastest, but sometimes suboptimal). Alternative estimation strategies ("deep" and "random") are implemented in FBMS.

mlpost_params

All parameters for the estimator function loglik.pi

intercept

Logical. Whether to include an intercept in the design matrix. Default is TRUE. No variable selection is performed on the intercept.

fixed

Integer specifying the number of leading columns in the design matrix to always include in the model. Default is 0.

sub

Logical. If TRUE, uses subsampling or a stochastic approximation approach to the likelihood rather than the full likelihood. Default is FALSE.

verbose

Logical. Whether to print messages during execution. Default is TRUE for gmjmcmc and FALSE for the parallel version.

Value

A list containing the following elements:

models

All models per population.

mc.models

All models accepted by mjmcmc per population.

populations

All features per population.

marg.probs

Marginal feature probabilities per population.

model.probs

Marginal feature probabilities per population.

model.probs.idx

Marginal feature probabilities per population.

best.margs

Best marginal model probability per population.

accept

Acceptance rate per population.

accept.tot

Overall acceptance rate.

best

Best marginal model probability throughout the run, represented as the maximum value in unlist(best.margs).

Examples

result <- gmjmcmc(y = matrix(rnorm(100), 100),
x = matrix(rnorm(600), 100), 
P = 2, 
transform = c("p0", "exp_dbl"))
summary(result)
plot(result)


FBMS documentation built on Sept. 13, 2025, 1:09 a.m.