knitr::opts_chunk$set( echo = TRUE, message = TRUE, warning = TRUE ) h2 = .5 nthreads = 4
First, we will load the packages that we will need for the example:
library(LTFHPlus) library(dplyr) library(MASS) library(future.apply) library(tidyr)
For simplicity's sake, we will use the same prevalence for both sexes. However, it is worth noting that in real-world applications, the cumulative incidence proportions (CIPs) needed for the thresholds will rarely be identical for both sexes. A common way to estimate the CIP is with the Aalen-Johansen estimator that accounts for competing events.
Single trait LT-FH++ requires 3 types of input, namely the liability-scale heritability, the family relationships and the status, age or age of onset, sex, and birth year for each family member, and finally cumulative incidence proportions (CIPs). The more detailed cumulative incidence proportions, the better. In the LT-FH++ paper, we used CIPs that were stratified by birth year and sex and fixed the upper and lower liability threshold for cases to $\Phi(1 - CIP)$, since it provided a very accurate estimate for the full liability. Each CIP curve was a function of age, where age represented the age of onset or the current age of a control. If less detailed information, such as CIPs stratified by sex, but not birth year, we recommend setting the lower threshold to be $\Phi(1 - CIP)$ and the upper limit to infinity for cases. With this information, the thresholds needed for the age-dependent liability threshold model can be assigned. We have provided a function called prepare_LTFHPlus_input
that can help users convert the input into a suitable format. An example of how the input may look and how to use the function can be found in From CIP and family to LT-FH++ input.
We have implemented a function that allows users to simulate under the liability threshold model for a given family structure, i.e. mother, father, sibling, or sibling1, sibling2, child1, child2, etc. The function simply needs a family structure, heritability, and population prevalence.For simplicity's sake, we will use the same prevalence for both sexes. However, it is worth noting that in real-world applications, the cumulative incidence proportions needed for the thresholds, will rarely be identical for both sexes. The function simulate_under_LTM
outputs a list with two tibbles. The first one sim_obs
contains the underlying liabilities, status, and age or age of onset for each family and family members. The second tibble thresholds
contains the formatted input that is ready to be input to estimate the genetic liability.
sims <- simulate_under_LTM(fam_vec = c("m","f","s1"), n_fam = NULL, add_ind = TRUE, h2 = h2, n_sim = 10, pop_prev = .1) sims
thresholds
keeps track of families and family members with fam_id
and indiv_id
. Here we simply use dummy variables for both. In real-world data, the indiv_id
is usually some pseudonymized identifier unique to each individual. There is total freedom to set fam_id
to whatever, as long as it is unique to each family. One choice could be the pseudonymized identifier for the index person in a family, or simply numerate the families from $1$ to the total number of families. Next, role
identifies each family member's relationship to the index person. They follow a system shown in the documentation for construct_covmat
. If a family member does not fit one of the shown, a suitable covariance matrix cannot be constructed. If more detailed family structures are needed, please contact one of the maintainers of LTFHPlus. Finally, the lower and upper liability thresholds are provided for each family member. Previously, we utilised list of lists in R to link information to the correct individuals. Going forward, we will still support this format, but in order to ease the use of non-R users, we have opted for a long format, where each row is an indivdual instead of a family, such that the input can be generated with the user's preferred software.
If you would like to use LTFHPlus, then the above can provide a template for the input format. The object sims
is a list of two entries, namely sim_obs
and thresholds
. The values in sim_obs
are values such as the true underlying liabilities, simulated ages and onset ages. etc.. The values in thresholds
are ready to be used in estimate_liability()
. Generating this input from ones own register data can be tedious, as it requires identifying parents (role for parents are $m$ and $f$), then one can identify siblings (roles for siblings are $s1, s2, etc.$), and so on for all available roles. The available roles can be seen in the documentation for estimate_liability()
.
With all input parameters generated and a chosen heritability, we have everything we need to run LT-FH++. We highly recommend the package Future to parallelize the computations for LT-FH++. Since all modern computers have more than 2 cores available to them, it should be available to all users. If a user is not able to parallelize, then LTFHPlus can still be used, but it will be far slower. The future framework also allows for easy use on high performance computing (HPC) clusters. In this example, we will utilize r nthreads
threads.
#Setting up parallelization backend plan(tweak(multisession, workers = nthreads)) #performs LT-FH++ analysis simu_liab = estimate_liability(.tbl = sims$thresholds, h2 = h2, pid = "indiv_ID", fam_id = "fam_ID", role = "role") simu_liab
From here the object simu_liab
can be saved and used in your GWAS method of choice.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.