knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" ) set.seed(1337)
Are you taking the course "Advanced statistical programming with R" in the summer term 2020? Then this page is for you. It explains what you need to do to get the credits for this module. r emo::ji("wink")
In this course, you are going work on a programming project in a group of three students. Each group needs to develop an R package, in which they implement a statistical method for the so-called location-scale regression model class, and a small simulation study to evaluate their method and implementation. In September, you need to submit a term paper about your project.
This git repository contains the asp20model
package for R, which implements the location-scale regression model class and will be the basis for your package.
Please note: If you are planning to take this course, you need to preregister on Stud.IP before March 31. Follow this link to sign up: https://studip.uni-goettingen.de/dispatch.php/course/details?sem_id=5b8822a839a628e8166648a57f4f1eac.
Update on the Coronavirus: Unsurprisingly, the Coronavirus crisis makes some changes to this course necessary. All previously announced dates remain valid, but I am going to offer the introductory session and the Q&A session as YouTube livestreams. See below for the links. The intermediate and final presentations might be replaced with short written reports, depending on the further developments. I am going to make a final decision on the presentations before the Q&A session on May 4.
The asp20model
package uses R6 classes, so make sure to get familiar with mutable objects, inheritance, etc. See the chapter on R6 classes in [3], available online: https://adv-r.hadley.nz/r6.html.
The asp20model
package is hosted on https://gitlab.gwdg.de (the GitLab instance of the computing center of the university), where you are going to develop your R package as well. I created a GitLab group for this course and a GitLab project for each group of students.
Please note: To start working on your project, log in to GitLab with your student account once and send me an email. After that, I can add you to the group and the project.
GitLab is an online platform where you can host git repositories, very similar to GitHub, and git is a popular version control system for software development. Systematic, version-controlled software development is an essential component of this course, so if you are not yet familiar with git, please take a look at the git chapter in [4], available online: http://r-pkgs.had.co.nz/git.html.
Please note: Your code will only be visible to me, the project supervisors, and your fellow students. Feel free to make mistakes and ask questions!
As mentioned above, we are going to work with the location-scale regression model class in this course. So, what is location-scale regression? Let's take a step back and look at the standard linear model first. It is defined as
y_i = \boldsymbol{x}_i' \boldsymbol{\beta} + \varepsilon_i, \text{ where } \varepsilon_i \overset{i.i.d.}{\sim} \mathcal{N}(0, \sigma^2).
From the definition, it follows that
y_i \overset{ind.}{\sim} \mathcal{N}(\boldsymbol{x}_i' \boldsymbol{\beta}, \sigma^2),
and from this representation, the model can easily be extended with a second linear predictor for the standard deviation. For this purpose, we introduce a covariate vector \$\boldsymbol{z}_i
\$ and a parameter vector \$\boldsymbol{\gamma}
\$:
y_i \overset{ind.}{\sim} \mathcal{N}(\boldsymbol{x}_i' \boldsymbol{\beta}, (\exp(\boldsymbol{z}_i' \boldsymbol{\gamma}))^2).
What is the \$\exp
\$ function doing in this formula? Well, one thing that is a little bit tricky about predicting the standard deviation is that the standard deviation needs to be positive, while the linear predictor \$\boldsymbol{z}_i' \boldsymbol{\gamma}
\$ can become negative for some choices of the parameter vector \$\boldsymbol{\gamma}
\$. Hence, to ensure that our predictions for the standard deviation are valid, we introduce the \$\exp
\$ function as a so-called response function for the standard deviation and define \$\sigma_i = \exp(\boldsymbol{z}_i' \boldsymbol{\gamma})
\$.
Why should we care about location-scale regression? It is not uncommon to observe heteroscedastic data like in the figure below. The standard way to deal with heteroscedastic residuals in a linear model is FGLS estimation, but FGLS does not provide an interpretable description of the variance structure of the data. A location-scale regression model can fill this gap, if we have access to explanatory variables for the variance or the standard deviation of the response variable.
n <- 500 x <- runif(n) y <- x + rnorm(n, sd = exp(-3 + 2 * x)) plot(x, y) abline(0, 1, lwd = 2) curve(x + 1.96 * exp(-3 + 2 * x), -0.1, 1.1, add = TRUE) curve(x - 1.96 * exp(-3 + 2 * x), -0.1, 1.1, add = TRUE)
asp20model
packageFirst, you need to install the asp20model
package from GitLab and load it:
install.packages("devtools") devtools::install_gitlab("asp20/asp20model", host = "gitlab.gwdg.de") library(asp20model)
Using the data from the plot above, we can set up a location-scale regression model. Note that the first command only sets up the design matrices and the parameter vectors but does not do any kind of inference. All parameters are initialized with a value of 0. We can use the loglik()
method to obtain the log-likelihood of the model in the initial state (i.e. with all parameters set to 0):
model <- LocationScaleRegression$new(y ~ x, ~ x) model$beta model$gamma model$loglik()
Now, let's update the parameter values manually and see how the log-likelihood changes:
model$beta <- c(0.1, 0.1) model$loglik()
For most inference algorithms, we also need the gradient of the log-likelihood with respect to the parameters, which we can obtain with the grad()
method:
model$grad_beta() model$grad_gamma()
Finally, the asp20model
package also comes with a simple gradient descent algorithm for maximum likelihood inference. Let's apply it to our model:
gradient_descent(model) model$beta model$gamma model$grad_beta() model$grad_gamma()
Whoops, that didn't work very well! The algorithm didn't converge, the estimated parameters are far away from the true values we used to simulate the data (\$\hat{\boldsymbol{\beta}}
\$ should be close to (0, 1), \$\hat{\boldsymbol{\gamma}}
\$ should be close to (-3, 2)), and the gradient is not close to 0. Let's try again with a smaller step size and more iterations:
gradient_descent(model, stepsize = 1e-06, maxit = 500000) model$beta model$gamma model$grad_beta() model$grad_gamma()
These results do look better, but they also show how inefficient the gradient descent algorithm is. The algorithm took more than a quarter million iterations to converge!
Your task is to do better than me and implement a more efficient inference algorithm for the location-scale regression model class. Use my code for the gradient_descent()
function as an example. Build on the LocationScaleRegression
R6 class and its methods and extend them if necessary. Think about inheritance and whether your extensions could be useful for other groups. If yes, feel free to share your code on GitLab and open a pull request in the asp20model
repository.
asp20boot
boot
R packageasp20cv
gamlss
R packageasp20lasso
asp20boost
asp20regPrior
asp20ssPrior
asp20hMCMC
asp20iwlsMCMC
asp20plot
car
and gamlss
R packagesasp20user
asp20*
packages, implementation of convenience functions, application to the datasets::airquality
datasetTo get the credits for this module, you need to...
asp20model
package and adds the functionality described above. Your code needs to be comprehensible, well documented, and covered by automated tests.You may earn "bonus points" for active exchange with other groups, preferably on GitLab via issues and pull requests. For example, the four groups working on Bayesian inference could develop a common API for the specification of priors. If you need additional features in the asp20model
package, feel free to open issues and pull requests in this repository.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.