lin_least_squares_train_test: lin_least_squares_train_test
In rguin26/HW4:

Description Usage Arguments Details Value Examples

View source: R/lin_least_squares_train_test.R

Performs linear regression using ordinary least squares (OLS) or weighted least squares (WLS), with either an intercept included or not, for only a portion of the total number of instances provided. The remaining instances are set aside for testing. This works with both simple and multiple linear regression. It calculates the beta coefficients of the least squares linear regression equation that is generated with the training set, and then it passes the testing set instances to generate predictions of that data which was previously unknown during the model training stage.

1	lin_least_squares_train_test(x, y, intercept = TRUE, weighted = FALSE, train_set_prop = 0.8)

`x`	matrix, dataframe, or vector of all predictor/predictors and its/their respective values
`y`	vector of target values from the matrix of known predictor values
`intercept`	TRUE by default, it computes the beta coefficients with an intercept included; if it is set to FALSE, then no intercept is used
`weighted`	FALSE by default, it computes the beta coefficients using ordinary least squares (OLS); if it is set to TRUE, then the beta coefficients are calculated using weighted least squares (WLS)
`train_set_prop`	set to 0.8 by default, it randomly selects this proportion of instances from x and y, of identical indexes, to use for training the linear least squares model, and then it uses the remaining instances of x and y for testing the model

works only with numeric data

A list of objects, including calculations of the estimates for the beta coefficients for each predictor, along with residuals and model evaluation metrics for both the training and testing sets

beta - vector of beta coefficients generated using the training subset of the initial data with values corresponding to their respective column in x, and starting with (intercept) if intercept = TRUE
training_fitted_values - vector consisting of the training fitted values of the training subset of x used to build the model, calculated simply as the model's prediction value for each instance in the training subset
training_residuals - vector consisting of the residuals of each instance in the training subset of x, calculated as the training fitted value minus the actual value for each instance in the training subset of x
training_model_eval_metrics - vector consisting of the training subset's evaluation metrics of the model, including sum of squared errors (SSE), mean squared error (MSE), root mean squared error (RMSE), mean absolute error (MAE), r-squared, and adjusted r-squared
testing_fitted_values - vector consisting of the testing fitted values of the testing subset of x, calculated simply as the model's prediction value for each instance in the testing subset
testing_residuals - vector consisting of the residuals of each instance in the testing subset of x, calculated as the testing fitted value minus the actual value for each instance in the testing subset of x
testing_model_eval_metrics - vector consisting of the testing subset's evaluation metrics of the model, including sum of squared errors (SSE), mean squared error (MSE), root mean squared error (RMSE), mean absolute error (MAE), r-squared, and adjusted r-squared

# Example 1
x <- data.frame(matrix(sample(100000, 200*5, replace=TRUE), ncol = 5))
y <- sample(100, 200, replace=TRUE)
model_stats <- lin_least_squares_train_test(x, y)
model_stats$beta
model_stats$training_model_eval_metrics
model_stats$testing_model_eval_metrics