knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(here) library(tidyverse) library(questionr) library(data.table)
UKB.COVID19
is an R package designed to process and analyze COVID-19 data from the UK Biobank (UKBB). It provides tools to summarize COVID-19 test results, perform association tests between COVID-19 outcomes and potential risk factors, and generate input files for genome-wide association studies (GWAS).
To install the UKB.COVID19
package, you can use the following commands:
# Install from CRAN # install.packages("UKB.COVID19") # Load the package library(UKB.COVID19)
Before using the package, ensure you have access to the UKBB COVID-19 data. You will need to download the relevant datasets and have them ready for analysis. Due to the restriction of using UKBB data, we illustrate the use cases using simulated data, which are located in the package UKB.COVID19/inst/extdata/ and can be retrieved with function covid_example
.
ukb.tab.file <- covid_example("sim_ukb.tab.gz") ukb.tab <- fread((ukb.tab.file)) head(ukb.tab)
The risk_factor
function generates a covariate table with risk factors using UKBB main tab data. Automatically returns sex, age at birthday in 2020, SES, self-reported ethnicity, most recently reported BMI, most recently reported pack-years, whether they reside in aged care (based on hospital admissions data, and covid test data) and blood type. Function also allows user to specify fields of interest (field codes, provided by UK Biobank), and allows the users to specify more intuitive names, for selected fields.
Note: the ukb.tab file must include fields: f.eid, f.31.0.0, f.34.0.0, f.189.0.0, f.21001., f.21000., f.20161.
covar <- risk_factor(ukb.data=covid_example("sim_ukb.tab.gz"), ABO.data=covid_example("sim_covid19_misc.txt.gz"), hesin.file=covid_example("sim_hesin.txt.gz"), res.eng=covid_example("sim_result_england.txt.gz")) head(covar)
The makePhenotypes
function summarizes COVID-19 test results, death register data, and hospital inpatient data. It generates phenotypes for COVID-19 susceptibility, severity, and mortality.
susceptibility <- makePhenotypes(ukb.data=covid_example("sim_ukb.tab.gz"), res.eng=covid_example("sim_result_england.txt.gz"), death.file=covid_example("sim_death.txt.gz"), death.cause.file=covid_example("sim_death_cause.txt.gz"), hesin.file=covid_example("sim_hesin.txt.gz"), hesin_diag.file=covid_example("sim_hesin_diag.txt.gz"), hesin_oper.file=covid_example("sim_hesin_oper.txt.gz"), hesin_critical.file=covid_example("sim_hesin_critical.txt.gz"), code.file=covid_example("coding240.txt.gz"), pheno.type = "susceptibility") head (susceptibility)
severity <- makePhenotypes(ukb.data=covid_example("sim_ukb.tab.gz"), res.eng=covid_example("sim_result_england.txt.gz"), death.file=covid_example("sim_death.txt.gz"), death.cause.file=covid_example("sim_death_cause.txt.gz"), hesin.file=covid_example("sim_hesin.txt.gz"), hesin_diag.file=covid_example("sim_hesin_diag.txt.gz"), hesin_oper.file=covid_example("sim_hesin_oper.txt.gz"), hesin_critical.file=covid_example("sim_hesin_critical.txt.gz"), code.file=covid_example("coding240.txt.gz"), pheno.type = "severity")
mortality <- makePhenotypes(ukb.data=covid_example("sim_ukb.tab.gz"), res.eng=covid_example("sim_result_england.txt.gz"), death.file=covid_example("sim_death.txt.gz"), death.cause.file=covid_example("sim_death_cause.txt.gz"), hesin.file=covid_example("sim_hesin.txt.gz"), hesin_diag.file=covid_example("sim_hesin_diag.txt.gz"), hesin_oper.file=covid_example("sim_hesin_oper.txt.gz"), hesin_critical.file=covid_example("sim_hesin_critical.txt.gz"), code.file=covid_example("coding240.txt.gz"), pheno.type = "mortality")
The log_cov
function performs association tests using logistic regressions. This is an example of association tests between COVID-19 susceptibility and three risk factors: sex, age and BMI.
log_cov(pheno=susceptibility, covariates=covar, phe.name="pos.neg", cov.name=c("sex", "age", "bmi"))
The comorbidity.summary
function scans all the hospitalisation records with a given time period and generates comorbidity summary table. The following example is to generate a comorbidity summary table that includes all the primary and secondary diagnoses in the hospital inpatient data after 16 March 2020.
comorb <- comorbidity_summary(ukb.data=covid_example("sim_ukb.tab.gz"), hesin.file=covid_example("sim_hesin.txt.gz"), hesin_diag.file=covid_example("sim_hesin_diag.txt.gz"), ICD10.file=covid_example("ICD10.coding19.txt.gz"), primary = FALSE, Date.start = "16/03/2020") comorb[1:6,1:10]
The comorbidity.asso
function performs association tests between comorbidity categories and selected phenotype using logistic regression models.This is an example of association tests between COVID-19 susceptibility and all comorbidities. It shows NAs when fitted probabilities numerically 0 or 1 occurred in the logistic regression models.
comorb.asso <- comorbidity_asso(pheno=susceptibility, covariates=covar, cormorbidity=comorb, population="white", cov.name=c("sex","age","bmi","SES","smoke","inAgedCare"), phe.name="pos.neg", ICD10.file=covid_example("ICD10.coding19.txt.gz")) head(comorb.asso, 4)
Here is an example workflow of analysing the risk factors of COVID-19 susceptibility, that combines the main functions of the UKB.COVID19 package:
# Load the package library(UKB.COVID19) # Summarize COVID-19 risk factors covar <- risk_factor(ukb.data=covid_example("sim_ukb.tab.gz"), ABO.data=covid_example("sim_covid19_misc.txt.gz"), hesin.file=covid_example("sim_hesin.txt.gz"), res.eng=covid_example("sim_result_england.txt.gz")) # Summarize COVID-19 test results susceptibility <- makePhenotypes(ukb.data=covid_example("sim_ukb.tab.gz"), res.eng=covid_example("sim_result_england.txt.gz"), death.file=covid_example("sim_death.txt.gz"), death.cause.file=covid_example("sim_death_cause.txt.gz"), hesin.file=covid_example("sim_hesin.txt.gz"), hesin_diag.file=covid_example("sim_hesin_diag.txt.gz"), hesin_oper.file=covid_example("sim_hesin_oper.txt.gz"), hesin_critical.file=covid_example("sim_hesin_critical.txt.gz"), code.file=covid_example("coding240.txt.gz"), pheno.type = "susceptibility") # Perfrom association tests log_cov(pheno=susceptibility, covariates=covar, phe.name="pos.neg", cov.name=c("sex", "age", "bmi")) # Generate comorbidity table comorb <- comorbidity_summary(ukb.data=covid_example("sim_ukb.tab.gz"), hesin.file=covid_example("sim_hesin.txt.gz"), hesin_diag.file=covid_example("sim_hesin_diag.txt.gz"), ICD10.file=covid_example("ICD10.coding19.txt.gz"), primary = FALSE, Date.start = "16/03/2020") # Perform association tests between COVID-19 phenotype and comorbidities comorb.asso <- comorbidity_asso(pheno=susceptibility, covariates=covar, cormorbidity=comorb, population="white", cov.name=c("sex","age","bmi","SES","smoke","inAgedCare"), phe.name="pos.neg", ICD10.file=covid_example("ICD10.coding19.txt.gz"))
The UKB.COVID19 package provides comprehensive tools for analyzing COVID-19 data from the UK Biobank. By following this vignette, users can efficiently summarize data, perform association tests, and prepare files for genetic analysis. For more detailed information, refer to the package documentation and the GitHub repository.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.