genset | R Documentation |
Generate mutliple data sets to demonstrate the importance of multiple regression. Data sets are generated from an initial data set input to have the same summary statistics (mean, median, and standard deviation) but opposing regression results (significance in the predictor variables). The initial data set will have one response variable (continuous) and two predictor variables (continous or one continuous and one categorical with 2 levels) that are statistically significant in a linear regression model.
genset(y, x1, x2, method=c(1,2), option=c("x1","x2","both"), n, decrease, output)
y |
response variable (continuous). |
x1 |
first predictor variable (continuous). |
x2 |
second predictor variable (continuous or categorical with 2 levels). If variable is categorical then argument is |
method |
the method |
option |
the variable(s) that will not statistically significant in the new data set ( |
n |
the number of iterations. Default is 2000 iterations. |
decrease |
indicates an increase or decrease in level of significance. |
output |
shows the iterations. |
The summary statistics are within a (predetermined) tolerance level, and when rounded, will be the same as the original data set. The standard convention of 0.05 is used as the significance level threshold. Less than n=2000
iterations may or may not be sufficient and is dependent on the initial data set.
Lori Murray and John Wilson
Murray, L.L. & Wilson, J.G. (2021). Generating data sets for teaching the importance of regression analysis. Decision Sciences Journal of Innovative Education (DSJIE), Vol 19 (2), 157-166.
## Choose variables of interest
y <- mtcars$mpg
x1 <- mtcars$hp
x2 <- mtcars$wt
## Create a dataframe
set1 <- data.frame(y, x1, x2)
## Check summary statistics
multi.fun <- function(x) {
c(mean = mean(x), media=median(x), sd=sd(x))
}
round(multi.fun(set1$y), 0)
round(multi.fun(set1$x1), 1)
round(multi.fun(set1$x2), 1)
## Fit linear regression model
## to verify regressors are statistically
## significant (p-value < 0.05)
summary(lm(y ~ x1, x2, data=set1))
## Set seed to reproduce same data set
set.seed(101)
set2 <- genset(y, x1, x2, method=1, option="x1", n=1000)
## Verify summary statistics match set 1
round(multi.fun(set2$y), 0)
round(multi.fun(set2$x1), 1)
round(multi.fun(set2$x2), 1)
## Fit linear regression model
## to verify x1 is not statistically
## significant (p-value > 0.05)
summary(lm(y ~ x1 + x2, data=set2))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.