In pointonjoel/Kenometrics: Support Tools for Econometrics Classes

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(Kenometrics)

Importing data

When importing data, students may often be overwhelmed with the specifics needed when importing from a variety of sources. Having one function with txt and Excel integration simplifies things further.

To import data to the current environment, there are two ways depending on the type of data. It is recommended that users also import the wooldridge package, which contains all of the datasets used in the textbook's examples.

Kenometrics provides support for a variety of other types of datasets. Simply specify the name of the file/dataset and the type of data.

#Code needed to obtain the NASA file on the user's computer
file = system.file("extdata", "NASA.xlsx", package="Kenometrics")
my_wd =substr(file,0,nchar(file)-nchar("NASA.xlsx")-1)

#Once the working directory has been obtained, the get_data function can be used
NASA_data <- get_data("NASA", type="excel", file_loc = my_wd)
print(NASA_data)

Whilst the function defaults the working directory to the current location of the workspace, students are encouraged to set a working directory prior to sourcing data. This can be done by using:

#my_wd="C:/Documents"
#And then for usage:
#setwd(my_wd)

Note the use of "/" rather than "\", as is customary in R.

T-tests

Once data has been imported, a t-test can then be conducted. Ordinarily, this is possible through the completing a linear regression and finding the summary of it.

hseinv_data <- get_data("hseinv", type="txt", file_loc = my_wd)
simple_model <- lm(lprice ~ lpop + linvpc , data=hseinv_data)
summary(simple_model)

However, this assumes that the hypothesised population mean is 0; there is no scope to change it. To conduct a T-Test where there is a non-0 population mean is simple with Kenometrics:

#t_stat <- ttest("linvpc", some_data, "simple_model", pop_mean = -1)
#print(t_stat)

Chow tests

With Kenometrics, a Chow test can easily be conducted. The function works by adding both an intercept and a slope dummy, allowing the user to select whether the test is for the intercept, slope, or both.

chow(1960, "year", hseinv_data, "linvpc ~ lpop", "both")

Making data time series

In order to add lagged variables to regressions, the data needs to be declared time series. This can be done easily, with the ability to override the old data frame or make a new one:

#Code needed to obtain the data file on the user's computer
file <-  system.file("extdata", "NASA.xlsx", package="Kenometrics")
my_wd <- substr(file,0,nchar(file)-nchar("NASA.xlsx")-1)
hseinv_data <- get_data("hseinv", "txt", file_loc=my_wd)

#Using the function to make a new data set that is time series
TS_data <- make_ts("year", hseinv_data)

#Using the function to override the old data with time series data
#Note that this is strongly discouraged, see below.
hseinv_data <- make_ts("year", hseinv_data)

The user can then use the dynlm function to add lagged variables to their model.

library(dynlm)
lagged_model <- dynlm(lpop ~ price + L(linvpc,0:2), data=TS_data)
summary(lagged_model)

When users convert to time series data, the ADF and chow test functions automatically convert this to numeric form in order to proceed. If users do, for whatever reason, want to convert back from time series (to numeric class), a function has been created to allow for this:

make_numeric(TS_data, "year")

However, this is discouraged because the initial data types are lost in the process of converting from time series and back to numeric. Therefore, users are encouraged to not override the initial dataset but instead use time series function to make a new time series data frame, also keeping the initial one.

ADF test

An ADF can ordinarily be run without Kenometrics. However, students often find the results confusing and difficult to understand. Kenometrics selects an appropriate test (defaulting to "constant, no trend") for students automatically, yet leaving the option for users to select their own. Additionally, if the variable is found to be non-stationary, the function calculates the 1st difference and returns the original dataset with a the 1st difference appended. This allows students to then run regressions knowing that all variables are stationary.

#overrides the previous dataset
#If the variable is stationary then no change is made
#If the variable is non-stationary then the 1st difference is appended
#We converted the data to time series, above, so we must add a time variable:
hseinv_data <- ADF("lpop", hseinv_data, type="type1", time_var = "year")

LRM

The final function available to users is the LRM function, which calculates the Long Run Multiplier/Propensity of a FDL model. It is currently configured to only allow 3 lags of one variable, and from this the LRM is calculated. Additional functionality to enable users to specify the number of lags is expected to be released in a future update. The LRM also calculates the t-statistic of the LRM and prints this, but only returns the LRM, as shown below.

#Sourcing the data from the user's computer
file <- system.file("extdata", "fertil3.txt", package="Kenometrics")
fertil3_data <- read.delim(file)

#Using the function
LR_prop <- LRM("gfr", "pe", 3, other_vars=c("ww2", "pill"), df=fertil3_data)
print(LR_prop)