View source: R/partial_least_squares.R
partial_least_squares | R Documentation |
partial_least_squares()
is a wrapper of the pls::plsr()
function to fit
a partial least squares regression model. You can fit univariate and
multivariate models for numeric responses only.
partial_least_squares(
x,
y,
method = "kernel",
scale = FALSE,
validate_params = TRUE,
seed = NULL,
verbose = TRUE
)
x |
( |
y |
( |
method |
( |
scale |
( |
validate_params |
( |
seed |
( |
verbose |
( |
You have to consider that all columns without variance (where all the records has the same value) are removed. Such columns positions are returned in the removed_x_cols field of the returned object.
This function performs random cross validation with 10 folds in order to find
the optimal number of components to use. This optimal value is used when
you call predict
using the fitted model but you can specify other number
of components to make the predictions.
All records with missing values (NA
), either in x
or in y
will be
removed. The positions of the removed records are returned in the
removed_rows
field of the returned object.
An object of class "PartialLeastSquaresModel"
that inherits from classes
"Model"
and "R6"
with the fields:
fitted_model
: An object of class pls::plsr()
with the model.
x
: The final m̀atrix
used to fit the model.
y
: The final vector
or m̀atrix
used to fit the model.
optimal_components_num
: A numeric
value with the optimal number of
components obtained with cross validation and used to fit the model.
execution_time
: A difftime
object with the total time taken to tune and
fit the model.
removed_rows
: A numeric
vector with the records' indices (in the
provided position) that were deleted and not taken in account in tunning
nor training.
removed_x_cols
: A numeric
vector with the columns' indices (in the
provided positions) that were deleted and not taken in account in tunning
nor training.
...
: Some other parameters for internal use.
predict.PartialLeastSquaresModel()
, coef.Model()
Other models:
bayesian_model()
,
deep_learning()
,
generalized_boosted_machine()
,
generalized_linear_model()
,
mixed_model()
,
random_forest()
,
support_vector_machine()
# Use all default hyperparameters -------------------------------------------
x <- to_matrix(iris[, -1])
y <- iris$Sepal.Length
model <- partial_least_squares(x, y)
# Obtain the optimal number of components to use with predict
model$optimal_components_num
# Obtain the model's coefficients
coef(model)
# Predict using the fitted model
predictions <- predict(model, x)
# Obtain the predicted values
predictions$predicted
# Predict with a non optimal number of components ---------------------------
x <- to_matrix(iris[, -1])
y <- iris$Sepal.Length
model <- partial_least_squares(x, y, method = "orthogonal")
# Obtain the optimal number of components to use with predict
model$optimal_components_num
# Predict using the fitted model with the optimal number of components
predictions <- predict(model, x)
# Obtain the predicted values
predictions$predicted
# Predict using the fitted model without the optimal number of components
predictions <- predict(model, x, components_num = 2)
# Obtain the predicted values
predictions$predicted
# Obtain the model's coefficients
coef(model)
# Obtain the execution time taken to tune and fit the model
model$execution_time
# Multivariate analysis -----------------------------------------------------
x <- to_matrix(iris[, -c(1, 2)])
y <- iris[, c(1, 2)]
model <- partial_least_squares(x, y, method = "wide_kernel")
# Predict using the fitted model
predictions <- predict(model, x)
# Obtain the predicted values of the first response variable
predictions$Sepal.Length$predicted
# Obtain the predicted values of the second response variable
predictions$Sepal.Width$predicted
# Obtain the predictions in a data.frame not in a list
predictions <- predict(model, x, format = "data.frame")
head(predictions)
# Genomic selection ------------------------------------------------------------
data(Wheat)
# Data preparation of G
Line <- model.matrix(~ 0 + Line, data = Wheat$Pheno)
# Compute cholesky
Geno <- cholesky(Wheat$Geno)
# G matrix
X <- Line %*% Geno
y <- Wheat$Pheno$Y
# Set seed for reproducible results
set.seed(2022)
folds <- cv_kfold(records_number = nrow(X), k = 3)
Predictions <- data.frame()
# Model training and predictions
for (i in seq_along(folds)) {
cat("*** Fold:", i, "***\n")
fold <- folds[[i]]
# Identify the training and testing sets
X_training <- X[fold$training, ]
X_testing <- X[fold$testing, ]
y_training <- y[fold$training]
y_testing <- y[fold$testing]
# Model training
model <- partial_least_squares(
x = X_training,
y = y_training,
scale = TRUE,
method = "kernel"
)
# Prediction of testing set
predictions <- predict(model, X_testing)
# Predictions for the i-th fold
FoldPredictions <- data.frame(
Fold = i,
Line = Wheat$Pheno$Line[fold$testing],
Env = Wheat$Pheno$Env[fold$testing],
Observed = y_testing,
Predicted = predictions$predicted
)
Predictions <- rbind(Predictions, FoldPredictions)
}
head(Predictions)
# Compute the summary of all predictions
summaries <- gs_summaries(Predictions)
# Summaries by Line
head(summaries$line)
# Summaries by Environment
summaries$env
# Summaries by Fold
summaries$fold
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.