knitr::opts_chunk$set(echo = TRUE)
This tutorial is designed both as a tutorial and as a reference document. In the practical, it is probably a good idea to read through sequentially. In the weeks of the Data Challenge, you will be able to come back to different sections and use them as you need.
We all know that with a dynamic document/notebook it is tempting to click through the pieces of code pressing "play" quickly as possible. That is not the way to get the most out of this tutorial. Some of the code chunks might not be things you need. Sometimes, the most useful thing might be to follow links to some of the additional resources and spend time reading those. And don't just focus on the code sections: time spent with a pen and paper designing the analysis might be some of the best-spent time!
Visual inspection of the data is invaluable for understanding what the code is doing. The tutorial does not contain much visual inspection, as we can't print participant data on the internet. Add head()
, str()
and other statements to get a feel for the data as you work through the tutorials.
Do be aware when using RMarkdown that it is easy to share participant data "by accident" if you print it as you go.
If you have a question, feel free to speak to any of the tutors or contact Rosemary Walmsley.
There are probably bugs. If you find them, please let me (Rosemary) know!
As discussed in the lecture it's a good idea to think in detail about what your question is before getting started. (Yes, we know this is stating the obvious... But just treat this section like permission to spend at least 20 minutes of the practical thinking through some of these things :) )
For this workshop, we'll start with the seemingly simple question: How is overall physical activity associated with risk of ischaemic heart disease?
You will have a different question for the Data Challenge.
One particular issue you might want to consider is whether you want to exclude some people from the study population. For example, you might exclude people who already have ischaemic heart disease, as they can't go on to develop the disease. You might also exclude people who have problems with data quality.
Further reading on selection into UK Biobank:
In this case, we might ask how we can operationalise the vague concept "physical activity". Do we mean something like total energy expenditure? Or time spent achieving a certain level of intensity? And how can that be defined from the wearable device data?
How can ischaemic heart disease be defined based on the available data sources? For example, which hospital admission codes are relevant? Read more:
We might be interested in a time-to-event outcome and so in using survival analysis methods (such as Cox regression). Do we expect a linear association (on the appropriate scale), or do we need to use methods that can cope with non-linearity?
Can we address confounding by adjusting for possible confounders? Can we assess impact of reverse causality (e.g. by excluding a period of follow-up)?
Read more:
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.