knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(MLR)
To fit a linear model, simply input a formula and dataset to the regression_fit() fit function. This produces a list containing the following objects:
formula: the model formuladimensions: the number of observations and covariatesresiduals: the model residuals, as a vectorfitted_values: the fitted values of the model, as a vectorcoefficients: the estimated coefficients, as a table, which also contains the results of a two-sided t-testconfidence_intervals: (1-alpha)% confidence intervals for each coefficient, as a matrixvariances: the variance-covariance matrixanova: the results of an ANOVA test, as a listr_squared: the R-squared and adjusted R-square, as a vector of length 2data(iris) form <- Sepal.Length ~ Sepal.Width model1 <- regression_fit(formula = form, data = iris) print(model1$coefficients)
Below, we test to ensure that our results align with the standard output from base R alternatives. One advantage of regression_fit(), you might observe, is that we can access important model components without needing to use additional functions like summary() or vcov().
model2 <- lm(formula = form, data = iris) model2_sum <- summary(model2) #Check coefficients all.equal(model1$coefficients, model2_sum$coefficients) #Check confidence intervals all.equal(model1$confidence_intervals,confint(model2)) #Check residuals all.equal(model1$residuals, model2$residuals, check.attributes = F) #Check fitted values all.equal(model1$fitted_values, model2$fitted.values, check.attributes = F) #Check variance-covariance all.equal(model1$variances, vcov(model2), check.attributes = F) #Check F test all.equal(model1$anova$f_value, model2_sum$fstatistic["value"], check.attributes = F) #Check R-squared model2_rsq <- c(model2_sum$r.squared, model2_sum$adj.r.squared) all.equal(model1$r_squared, model2_rsq, check.attributes = F)
Next, we compare the efficiency of regression_fit() to lm() using the bench package. We can see in the output table below that the lm() function is marginally faster on average.
library(bench) bench::mark( regression_fit(form, iris)$coefficients, summary(lm(form, iris))$coefficients )
We can fit a multiple regression model in the same way as an SLR. Here, we have also specified a particular alpha to use in our confidence intervals, instead of using the default of 0.05.
form2 <- Sepal.Length ~ Sepal.Width + Petal.Length + Species model1 <- regression_fit(formula = form2, data = iris, alpha = 0.1) print(model1$coefficients)
Once again, we test the results below. All match the output provided by base R functions.
model2 <- lm(formula = form2, data = iris) model2_sum <- summary(model2) #Check coefficients all.equal(model1$coefficients, model2_sum$coefficients) #Check confidence intervals all.equal(model1$confidence_intervals,confint(model2, level = 0.9)) #Check residuals all.equal(model1$residuals, model2$residuals, check.attributes = F) #Check fitted values all.equal(model1$fitted_values, model2$fitted.values, check.attributes = F) #Check variance-covariance all.equal(model1$variances, vcov(model2), check.attributes = F) #Check F test all.equal(model1$anova$f_value, model2_sum$fstatistic["value"], check.attributes = F) #Check R-squared model2_rsq <- c(model2_sum$r.squared, model2_sum$adj.r.squared) all.equal(model1$r_squared, model2_rsq, check.attributes = F)
Lastly, we compare the speed of regression_fit() and lm() for multiple linear regression. Like before, we find that the base R function might be marginally faster, but the results do vary slightly from one trial to the next.
bench::mark( regression_fit(form2, iris)$coefficients, summary(lm(form2, iris))$coefficients )
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.