Introduction

The code for this assignment is stored in the 'meerkat' package. It contains helper functions that are not exported and for which no visible documentation was written. At the start of each part, I give a link to the R-code hosted on GitHub where you can view the code.

PART I

The code for the t-test can be found here

1. Create a function that computes the t-test.

See the code for the t-statistic here and the pooled variance here

The t_test function can be found here

2. Test your function with the following data obtained from a study by air force psychologists conducting research into the relative effectiveness of training pilots (see Table 1). The first training method consists of computer simulated flight instruction (CSFI) while the second training method uses traditional flight instruction (TFI). The 18 pilots were randomly assigned to the two experimental conditions CSFI and TFI and the performance test scores in each condition are displayed in Table 1. Calculate the t-statistic with your newly created function and check the results by analyzing the same data with the in-built R function t.test and compare the results.

library(meerkat)
# Create two variables
CSFI <- c(2,5,5,6,6,7,8,9)
TFI <- c(1,1,2,3,3,4,5,7,7,8)

# Calculate t-test
set.seed(500)
tt1 <- meerkat::t_test(CSFI, TFI, variance_equal = TRUE, 
                       bootstrap_ssize = 0.7, 
                       R=1000)
tt1

The results indicate that there is no statistically different performance between pilots who use traditional flight instructions and pilots who are trained using computer-simulated flight instruction.

When we compare this to the built-in R function, we get:

t.test(CSFI, TFI, var.equal = TRUE)

From the t-test We observe that the 95\% confidence interval contains $0$. This means that we may conclude there is no effect.

3. Add code to your function that calculates the p-value for a two-sided t-test.

See the code here

4. The last part of your function consists of organizing the output in the form of a list. The output should consist of at least the value of the t-statistic and the p-value. You are free to add other statistics of interests.

The print method of the t_test() function prints the information returned by this function in a neatly organized way.

The t_test() function returns an S3 object containing:

t1 <- t_test(CSFI, TFI)
names(t1)

See the code here, here, and here

PART II

The code for the linear model can be found here

# Load gala data
data("gala")

1. Use the in-built R-function for linear regression and the summary() function to obtain the output for a multiple linear regression analysis on the variables described. Obtain predicted values and residuals using the extractor functions. Make a plot of the predicted values against the residuals. Give an interpretation of the results of the regression model and the plot

# Run a linear regression model
mod1 <- lm("Species ~ Area + Elevation + Endemics", data = gala)
summary(mod1)
# Retrieve residuals and predicted values
residuals_mod1 <- resid(mod1)
predicted_mod1 <- predict(mod1)

# Plot
plot(predicted_mod1, residuals_mod1)

2. Obtain the same estimates as in the previous step, but now with matrix algebra

# Make subset of the variables
y <- as.matrix(gala[, "Species"])
# Create a formula
form <- formula("Species ~ Area + Elevation + Endemics")
# Create a model matrix
X <- model.matrix(form, gala)
# Apply the formula for the linear model
linmod <- solve( t(X) %*% X ) %*% t(X) %*% y
# Create the predicted values
yhat <- X %*% linmod
# Residuals
resid <- (y - yhat)
# Plot
plot(yhat, resid)

The results obtained are the same as those obtained from the lm() function

print.listof(list("Coefficients" = cbind(linmod, coef(mod1))))

3. Once you have the correct code to obtain the estimates with matrix algebra, write a function that has as input a data set suitable for multiple linear regression

See the code of the linear_model() function here. The helper functions in this file calculate residuals, predicted values, t-statistics, p-values, $R^2$, F-statistic etc.

# Load the meerkat library
library(meerkat)
# Create the model using the formula and the gala data
gala_mod <- linear_model(form, gala)
# Print the model
gala_mod

Like the lm() function, we can call summary() on the model

# Call the summary function
summary(gala_mod)

The plot() method for the linear_model() function plots the predicted values versus the residuals

plot(gala_mod)


JasperHG90/meerkat documentation built on May 31, 2019, 5:39 a.m.