Are you taking the course “Advanced statistical programming with R” in the summer term 2020? Then this page is for you. It explains what you need to do to get the credits for this module. 😉
In this course, you are going work on a programming project in a group of three students. Each group needs to develop an R package, in which they implement a statistical method for the so-called location-scale regression model class, and a small simulation study to evaluate their method and implementation. In September, you need to submit a term paper about your project.
This git repository contains the asp20model package for R, which
implements the location-scale regression model class and will be the
basis for your package.
Please note: If you are planning to take this course, you need to preregister on Stud.IP before March 31. Follow this link to sign up: https://studip.uni-goettingen.de/dispatch.php/course/details?sem_id=5b8822a839a628e8166648a57f4f1eac.
Update on the Coronavirus: Unsurprisingly, the Coronavirus crisis makes some changes to this course necessary. All previously announced dates remain valid, but I am going to offer the introductory session and the Q\&A session as YouTube livestreams. See below for the links. The intermediate and final presentations might be replaced with short written reports, depending on the further developments. I am going to make a final decision on the presentations before the Q\&A session on May 4.
The asp20model package uses R6 classes, so make sure to get familiar
with mutable objects, inheritance, etc. See the chapter on R6 classes in
[3], available online: https://adv-r.hadley.nz/r6.html.
The asp20model package is hosted on https://gitlab.gwdg.de (the
GitLab instance of the computing center of the university), where you
are going to develop your R package as well. I created a GitLab group
for this course and a GitLab project for each group of students.
Please note: To start working on your project, log in to GitLab with your student account once and send me an email. After that, I can add you to the group and the project.
GitLab is an online platform where you can host git repositories, very similar to GitHub, and git is a popular version control system for software development. Systematic, version-controlled software development is an essential component of this course, so if you are not yet familiar with git, please take a look at the git chapter in [4], available online: http://r-pkgs.had.co.nz/git.html.
Please note: Your code will only be visible to me, the project supervisors, and your fellow students. Feel free to make mistakes and ask questions!
As mentioned above, we are going to work with the location-scale regression model class in this course. So, what is location-scale regression? Let’s take a step back and look at the standard linear model first. It is defined as
y_i = \boldsymbol{x}_i' \boldsymbol{\beta} + \varepsilon_i, \text{ where } \varepsilon_i \overset{i.i.d.}{\sim} \mathcal{N}(0, \sigma^2).
From the definition, it follows that
y_i \overset{ind.}{\sim} \mathcal{N}(\boldsymbol{x}_i' \boldsymbol{\beta}, \sigma^2),
and from this representation, the model can easily be extended with a
second linear predictor for the standard deviation. For this purpose, we
introduce a covariate vector $\boldsymbol{z}_i$ and a parameter vector
$\boldsymbol{\gamma}$:
y_i \overset{ind.}{\sim} \mathcal{N}(\boldsymbol{x}_i' \boldsymbol{\beta}, (\exp(\boldsymbol{z}_i' \boldsymbol{\gamma}))^2).
What is the $\exp$ function doing in this formula? Well, one thing
that is a little bit tricky about predicting the standard deviation is
that the standard deviation needs to be positive, while the linear
predictor $\boldsymbol{z}_i' \boldsymbol{\gamma}$ can become negative
for some choices of the parameter vector $\boldsymbol{\gamma}$. Hence,
to ensure that our predictions for the standard deviation are valid, we
introduce the $\exp$ function as a so-called response function for the
standard deviation and define $\sigma_i = \exp(\boldsymbol{z}_i'
\boldsymbol{\gamma})$.
Why should we care about location-scale regression? It is not uncommon to observe heteroscedastic data like in the figure below. The standard way to deal with heteroscedastic residuals in a linear model is FGLS estimation, but FGLS does not provide an interpretable description of the variance structure of the data. A location-scale regression model can fill this gap, if we have access to explanatory variables for the variance or the standard deviation of the response variable.
n <- 500
x <- runif(n)
y <- x + rnorm(n, sd = exp(-3 + 2 * x))
plot(x, y)
abline(0, 1, lwd = 2)
curve(x + 1.96 * exp(-3 + 2 * x), -0.1, 1.1, add = TRUE)
curve(x - 1.96 * exp(-3 + 2 * x), -0.1, 1.1, add = TRUE)

asp20model packageFirst, you need to install the asp20model package from GitLab and load
it:
## install.packages("devtools")
## devtools::install_gitlab("asp20/asp20model", host = "gitlab.gwdg.de")
library(asp20model)
Using the data from the plot above, we can set up a location-scale
regression model. Note that the first command only sets up the design
matrices and the parameter vectors but does not do any kind of
inference. All parameters are initialized with a value of 0. We can use
the loglik() method to obtain the log-likelihood of the model in the
initial state (i.e. with all parameters set to 0):
model <- LocationScaleRegression$new(y ~ x, ~ x)
model$beta
#> [1] 0 0
model$gamma
#> [1] 0 0
model$loglik()
#> [1] -552.6567
Now, let’s update the parameter values manually and see how the log-likelihood changes:
model$beta <- c(0.1, 0.1)
model$loglik()
#> [1] -516.8169
For most inference algorithms, we also need the gradient of the
log-likelihood with respect to the parameters, which we can obtain with
the grad() method:
model$grad_beta()
#> (Intercept) x
#> 174.5098 125.7585
model$grad_gamma()
#> (Intercept) x
#> -385.3047 -157.7906
Finally, the asp20model package also comes with a simple gradient
descent algorithm for
maximum likelihood inference. Let’s apply it to our model:
gradient_descent(model)
#> Finishing after 1000 iterations
model$beta
#> (Intercept) x
#> 5.891795 4.627923
model$gamma
#> (Intercept) x
#> 1.7398635 0.5971787
model$grad_beta()
#> (Intercept) x
#> -66.29382 -28.97002
model$grad_gamma()
#> (Intercept) x
#> -4.136286 -12.307770
Whoops, that didn’t work very well! The algorithm didn’t converge, the
estimated parameters are far away from the true values we used to
simulate the data ($\hat{\boldsymbol{\beta}}$ should be close to (0,
1), $\hat{\boldsymbol{\gamma}}$ should be close to (-3, 2)), and the
gradient is not close to 0. Let’s try again with a smaller step size and
more iterations:
gradient_descent(model, stepsize = 1e-06, maxit = 500000)
#> Finishing after 260393 iterations
model$beta
#> (Intercept) x
#> -0.003708951 1.010004226
model$gamma
#> (Intercept) x
#> -3.040401 2.123791
model$grad_beta()
#> (Intercept) x
#> -8.143048e-07 3.635703e-06
model$grad_gamma()
#> (Intercept) x
#> -0.0005285036 0.0009999457
These results do look better, but they also show how inefficient the gradient descent algorithm is. The algorithm took more than a quarter million iterations to converge!
Your task is to do better than me and implement a more efficient
inference algorithm for the location-scale regression model class. Use
my code for the gradient_descent() function as an example. Build on
the LocationScaleRegression R6 class and its methods and extend them
if necessary. Think about inheritance and whether your extensions could
be useful for other groups. If yes, feel free to share your code on
GitLab and open a pull request in the asp20model repository.
asp20bootboot R packageasp20cvgamlss R packageasp20lassoasp20boostasp20regPriorasp20ssPriorasp20hMCMCasp20iwlsMCMCasp20plotcar
and gamlss R packagesasp20userasp20* packages, implementation of convenience functions,
application to the datasets::airquality datasetTo get the credits for this module, you need to…
asp20model package and
adds the functionality described above. Your code needs to be
comprehensible, well documented, and covered by automated tests.You may earn “bonus points” for active exchange with other groups,
preferably on GitLab via issues and pull requests. For example, the four
groups working on Bayesian inference could develop a common API for the
specification of priors. If you need additional features in the
asp20model package, feel free to open issues and pull requests in this
repository.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.