title: "Getting Started with olr: Optimal Linear Regression" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting Started with olr: Optimal Linear Regression} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8}
cat(" <style> pre code { white-space: pre-wrap; word-wrap: break-word; overflow-x: auto; font-size: 90%; } </style> ")
knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE) library(olr) library(ggplot2)
The olr
package provides a systematic way to identify the best linear regression model by testing all combinations of predictor variables. You can choose to optimize based on either R-squared or adjusted R-squared.
# Load data crudeoildata <- read.csv(system.file("extdata", "crudeoildata.csv", package = "olr")) dataset <- crudeoildata[, -1] # Define variables responseName <- 'CrudeOil' predictorNames <- c('RigCount', 'API', 'FieldProduction', 'RefinerNetInput', 'OperableCapacity', 'Imports', 'StocksExcludingSPR', 'NonCommercialLong', 'NonCommercialShort', 'CommercialLong', 'CommercialShort', 'OpenInterest')
# Full model using R-squared model_r2 <- olr(dataset, responseName, predictorNames, adjr2 = FALSE) # Adjusted R-squared model model_adjr2 <- olr(dataset, responseName, predictorNames, adjr2 = TRUE)
# Actual values actual <- dataset[[responseName]] fitted_r2 <- model_r2$fitted.values fitted_adjr2 <- model_adjr2$fitted.values # Data frames for ggplot plot_data <- data.frame( Index = 1:length(actual), Actual = actual, R2_Fitted = fitted_r2, AdjR2_Fitted = fitted_adjr2 ) # Plot both fits ggplot(plot_data, aes(x = Index)) + geom_line(aes(y = Actual), color = "black", size = 1, linetype = "dashed") + geom_line(aes(y = R2_Fitted), color = "steelblue", size = 1) + labs( title = "Full Model (R-squared): Actual vs Fitted Values", subtitle = "Observation Index used in place of dates (parsed from original dataset)", x = "Observation Index", y = "CrudeOil % Change" ) + theme_minimal()
ggplot(plot_data, aes(x = Index)) + geom_line(aes(y = Actual), color = "black", size = 1, linetype = "dashed") + geom_line(aes(y = AdjR2_Fitted), color = "limegreen", size = 1.1) + labs( title = "Optimal Model (Adjusted R-squared): Actual vs Fitted Values", subtitle = "Observation Index used in place of dates (parsed from original dataset)", x = "Observation Index", y = "CrudeOil % Change" )+ theme_minimal() + theme(plot.background = element_rect(color = "limegreen", size = 2))
| Metric | adjr2 = FALSE (All 12 Predictors) | adjr2 = TRUE (Best Subset of 7 Predictors) | |---------------------------|-----------------------------------|---------------------------------------------| | Adjusted R-squared | 0.6145 | 0.6531 ✅ (higher is better) | | Multiple R-squared | 0.7018 | 0.699 | | Residual Std. Error | 0.02388 | 0.02265 ✅ (lower is better) | | F-statistic (p-value) | 8.042 (1.88e-07) | 15.26 (3.99e-10) ✅ (stronger model) | | Model Complexity | 12 predictors | 7 predictors ✅ (simpler, more robust) | | Significant Coeffs | 4 | 6 ✅ (more signal, less noise) | | R² Difference | — | ~0.003 ❗ (negligible) |
olr()
function automates model selection by testing every valid predictor combination.adjr2 = TRUE
to prioritize models that balance accuracy and parsimony.The adjusted R² model outperformed the full model on: - Adjusted R² - F-statistic - Residual error - Model simplicity - # of significant coefficients
👉 Use adjusted R² (adjr2 = TRUE
) in practice to avoid overfitting and ensure interpretability.
Created by Mathew Fok • Author of the olr
package
Contact: quiksilver67213@yahoo.com
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.