bayesian_model: Fit a Bayesian Generalized Linear Regression Model (BGLR)

View source: R/bayesian_model.R

bayesian_modelR Documentation

Fit a Bayesian Generalized Linear Regression Model (BGLR)

Description

bayesian_model() is a wrapper of the BGLR::BGLR() and BGLR::Multitrait() functions to fit a Bayesian regresssion model. You can fit univariate models for numeric and categorical response variables and multivariate models for numeric responses only.

Usage

bayesian_model(
  x,
  y,
  iterations_number = 1500,
  burn_in = 500,
  thinning = 5,
  covariance_structure = list(df0 = 5, S0 = NULL, type = "Unstructured"),
  records_weights = NULL,
  response_groups = NULL,
  testing_indices = NULL,
  validate_params = TRUE,
  seed = NULL,
  verbose = TRUE
)

Arguments

x

(list) The predictor (independent) variable(s). It is expected a list with nested list's where each inner list is named and represents a predictor effect. Such inner list's must have the two names: x (matrix) with the predictor variable that is going to be converted to numeric and model (character(1)) (case not sensitive) with the type of model to apply to this predictor term, the available models are "FIXED", "BGBLUP", "BRR", "Bayes_Lasso", "Bayes_A", "Bayes_B" and "Bayes_C". In multivariate models you can only use "FIXED", "BGBLUP" and "BRR". "BRR" by default.

y

(data.frame | vector | matrix) The response (dependent) variable(s). If it is a data.frame or a matrix with 2 or more columns, a multivariate model is assumed, a univariate model otherwise. In univariate models if y is character, logical or factor a categorical response is assumed, numeric otherwise. In multivariate models all responses are coerced to numeric. y can contain missing values (NA) which represent the observations to be used as testing set along the provided indices in testing_indices parameter.

iterations_number

(numeric(1)) Number of iterations to fit the model. 1500 by default.

burn_in

(numeric(1)) Number of items to burn at the beginning of the model. 500 by default.

thinning

(numeric(1)) Number of items to thin the model. 5 by default.

covariance_structure

(list) (Only for multivariate models) A named list used to define the co-variance matrix for model residuals. This list must have the fileds type (character(1)) (case not sensitive) with one of the following values "Unstructured", "Diagonal", "Factor_analytic" or "Recursive", df0 (numeric(1)) with the degrees of freedom and S0 with the covariance matrix of size ⁠t x t⁠, where t is the number of response variables. By default the next list is used: list(df0 = 5, S0 = NULL, type = "Unstructured").

records_weights

(numeric) (only for univariate models with a numeric response variables) A vector of weights. If weights are provided the residual variance of each data-point is set to be proportional to the inverse of the squared-weight. NULL by default.

response_groups

(factor) (only for univariate models) A vector of the same length as y that associates observations with groups, each group will have an associated variance component for the error term. NULL by default.

testing_indices

(numeric) The records' indices to be used as testing set along all that contain missing values in y. NULL by default.

validate_params

(logical(1)) Should the parameters be validated? It is not recommended to set this parameter to FALSE because if something fails a non meaningful error is going to be thrown. TRUE by default.

seed

(numeric(1)) A value to be used as internal seed for reproducible results. NULL by default.

verbose

(logical(1)) Should the progress information be printed? TRUE by default.

Details

Since BGLR functions works a little different than other most common R packages for machine learning bayesian_model functions adapts to it. Unlike other functions, if you want to fit a bayesian model and make some predictions you have to provide the whole data (for training and testing) and the records' indices to be used as testing (testing_indices). All records with NA values in y are considered as part of testing set too. After fitting the model, the predicted values can be obtained with the predict function, with no more parameter than the model, see Examples section below for more information.

Value

An object of class "BayesianModel" that inherits from classes "Model" and "R6" with the fields:

  • fitted_model: An object of class BGLR::BGLR() with the model.

  • x: The final list used to fit the model.

  • y: The final vector or matrix used to fit the model.

  • execution_time: A difftime object with the total time taken to tune and fit the model.

  • removed_rows: A numeric vector with the records' indices (in the provided position) that were deleted and not taken in account in tunning nor training.

  • removed_x_cols: A numeric vector with the columns' indices (in the provided positions) that were deleted and not taken in account in tunning nor training.

  • ...: Some other parameters for internal use.

See Also

predict.BayesianModel(), coef.Model()

Other models: deep_learning(), generalized_boosted_machine(), generalized_linear_model(), mixed_model(), partial_least_squares(), random_forest(), support_vector_machine()

Examples

# Use all default hyperparameters ----------------------------------------------
x <- list(list(x = to_matrix(iris[, -5]), model = "BRR"))
y <- iris$Species
model <- bayesian_model(x, y, testing_indices = c(1:5, 51:55, 101:105))

# Obtain the model's coefficients
coef(model)

# Predict using the fitted model
predictions <- predict(model, indices = c(1:5, 51:55, 101:105))
# Obtain the predicted values
predictions$predicted
# Obtain the predicted probabilities
predictions$probabilities

# Obtain the predict values of custom individuals ------------------------------
x <- list(
  list(x = to_matrix(iris$Species), model = "fixed"),
  list(x = to_matrix(iris[, c(3, 4)]), model = "bayes_a")
)
y <- iris$Sepal.Length
y[c(5, 10, 15, 60, 80, 120, 130)] <- NA
model <- bayesian_model(
  x,
  y,
  iterations_number = 2000,
  burn_in = 500
)

# Predict using the fitted model
predictions <- predict(model, indices = c(5, 10, 15, 60, 80, 120, 130))
# Obtain the predicted values
predictions$predicted

# Obtain the Predicted values of all individuals using the fitted model
predictions <- predict(model, indices = seq_along(y))
# Obtain the predicted values
predictions$predicted

# Multivariate analysis --------------------------------------------------------
x <- list(list(x = to_matrix(iris[, -c(1, 2)]), model = "fixed"))
y <- iris[, c(1, 2)]
model <- bayesian_model(x, y, iterations_number = 2000)

# Predict using the fitted model
predictions <- predict(model, indices = 1:50)
# Obtain the predicted values of the first response variable
predictions$Sepal.Length$predicted
# Obtain the predicted values of the second response variable
predictions$Sepal.Width$predicted

# Obtain the predictions in a data.frame not in a list
predictions <- predict(model, indices = 1:50, format = "data.frame")
head(predictions)

# Genomic selection ------------------------------------------------------------
data(Maize)

# Data preparation of G
Line <- model.matrix(~ 0 + Line, data = Maize$Pheno)
Env <- model.matrix(~ 0 + Env, data = Maize$Pheno)
# Compute cholesky
Geno <- cholesky(Maize$Geno)
# G matrix
LineGeno <- Line %*% Geno

# Identify the model
X <- list(
  Env = list(x = Env, model = "FIXED"),
  LinexGeno = list(x = LineGeno, model = "BRR")
)
y <- Maize$Pheno$Y

# Set seed for reproducible results
set.seed(2022)
folds <- cv_kfold(records_number = nrow(LineGeno), k = 5)

Predictions <- data.frame()

# Model training and predictions
for (i in seq_along(folds)) {
  cat("*** Fold:", i, "***\n")
  fold <- folds[[i]]

  # Model training
  model <- bayesian_model(
    x = X,
    y = y,
    testing_indices = fold$testing,
    iterations_number = 1000,
    burn_in = 500
  )

  # Prediction of testing set
  predictions <- predict(model, indices = fold$testing)

  # Predictions for the i-th fold
  FoldPredictions <- data.frame(
    Fold = i,
    Line = Maize$Pheno$Line[fold$testing],
    Env = Maize$Pheno$Env[fold$testing],
    Observed = y[fold$testing],
    Predicted = predictions$predicted
  )
  Predictions <- rbind(Predictions, FoldPredictions)
}

head(Predictions)
# Compute the summary of all predictions
summaries <- gs_summaries(Predictions)

# Summaries by Line
head(summaries$line)

# Summaries by Environment
summaries$env

# Summaries by Fold
summaries$fold

brandon-mosqueda/SKM documentation built on Feb. 8, 2025, 5:24 p.m.