knitr::opts_chunk$set(echo = TRUE)
library(LTFHPlus) library(dplyr)
Here we will simply simulate a potential input format. We create tbæ
, which contains information on each person to attach thresholds to. It should contain each family member along with the needed information. Here, we simply use the proband with no family members. Next, we create CIP
, which contains the cumulative incidence proportions. The CIP is stratified by birth year and sex for illustrative purposes. If users only have CIPs stratified by sex, it would simply have one fewer columns. Please note that all values shown here are only for illustrative purposes.
n_sim = 10 tbl = tibble( fam_id = paste0("fam", 1:n_sim), pid = 1:n_sim, role = rep("o", n_sim), sex = sample(x = 0:1, size = n_sim, replace = T), status = sample(size = n_sim, x = 0:1, replace = T), age = sample(size = n_sim, x = 1:90, replace = T), birth_year = 2023 - age, aoo = purrr::map2_dbl(.x = status, .y = age, .f = ~ ifelse(.x == 1, sample(size = 1, x = 1:.y), NA)) ) %>% print() #### THIS IS DUMMY CIP. DO NOT USE FOR REAL-WORLD DATA USE #### CIP = expand.grid(list(age = 1:100, birth_year = 1900:2024, sex = 0:1)) %>% group_by(sex, birth_year) %>% mutate(cip = (1:n() - 1)/n() * .1) %>% ungroup() %>% print() #### THIS IS DUMMY CIP. DO NOT USE FOR REAL-WORLD DATA USE ####
Assigning thresholds to each person in tbl
can now be done with the function prepare_LTFHPlus_input
. The thresholds can be assigned in two ways. The first is matching directly on the combinations of birth year, sex, and age of each person to the combinations that are present in the CIP
object. The second uses interpolation to predict the CIP value between the observed combinations of birth year, sex, and age that is present in the CIP object. Currently, only interpolation with xgboost package is supported. The interpolation can be useful, since real-world data often lead to ages or age of onsets that can be expressed as decimals and rounding may lead to large jumps in CIP values.
The outputs below can be subset such that only the required information is left. For direct input into estimate_liability()
, only the family and personal id columns are needed as well as role (if the graph input is not used) and the lower and upper columns.
Without using interpolation, meaning we match on the combinations of birth year, sex and age that are present in both the tbl
and CIP
objects. The thresholds can then be assigned in the following way:
tbl2 = prepare_LTFHPlus_input(.tbl = tbl, CIP = CIP, age_col = "age", aoo_col = "aoo", CIP_merge_columns = c("age","birth_year", "sex"), CIP_cip_col = "cip", status_col = "status", use_fixed_case_thr = F, fam_id_col = "fam_id", personal_id_col = "pid", interpolation = NA, min_CIP_value = 1e-4) tbl2
If decimal ages and ages of onset are present, then we can interpolate the cip values with the xgboost package. The input does not change, except for interpolate = "xgboost"
. The current input data does not contain decimal values, but for illustrative purposes, we will use it as-is. Parameters can be passed to xgboost through the bst.params
variable.
tbl2_xgb = prepare_LTFHPlus_input(.tbl = tbl, CIP = CIP, age_col = "age", aoo_col = "aoo", CIP_merge_columns = c("age","birth_year", "sex"), CIP_cip_col = "cip", status_col = "status", use_fixed_case_thr = F, fam_id_col = "fam_id", personal_id_col = "pid", interpolation = "xgboost", xgboost_itr = 30, min_CIP_value = 1e-4) tbl2_xgb
From here, the above objects tbl2
and tbl2_xgb
can be subset to the relevant columns and used in estimate_liability()
. See LT-FH++ Example for an example of this.
The objects can also be subset to contain just the family and personal ID columns, as well as the lower and upper columns, and then used as input in prepare_graph()
to assign each individual with the threshold information as attributes. See LT-FH++ Graph Example for details.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.