compute_km_weights: Compute Kaplan-Meier type weights for (matched) nested...
In nyilin/SamplingDesignTools: Tools for Dealing with Complex Sampling Designs

compute_km_weights

R Documentation

Compute Kaplan-Meier type weights for (matched) nested case-control (NCC) sample

Description

Compute Kaplan-Meier type weights for (matched) nested case-control (NCC) sample

Usage

compute_km_weights(
  cohort = NULL,
  ncc = NULL,
  id_name = NULL,
  risk_table_manual = NULL,
  t_start_name = NULL,
  t_name = NULL,
  sample_stat = NULL,
  t_match_name = t_name,
  y_name = NULL,
  match_var_names = NULL,
  n_per_case,
  return_risk_table = FALSE,
  km_names = c("km_prob", "km_weight")
)

Arguments

`cohort`	Cohort data with at least the following information on each subject: start time (if not 0 for all subjects) and end time of follow-up, censoring status and matching variables (if any). A `data.frame` or a matrix with column names.
`ncc`	(Matched) NCC data, if `cohort` is not available. This data should not include the ID of each matched set, but should include the actual event/censoring time of each subject. A `data.frame` or a matrix with column names.
`id_name`	Name of the column indicating subject ID in `ncc`, if `cohort` is not available.
`risk_table_manual`	Number of subjects at risk at time of each cases in the NCC, if `cohort` is not available. A `data.frame` or a matrix with column names. See Details.
`t_start_name`	Name of the variable in `cohort` or `ncc` for the start time of follow-up. A `string`. Default is `NULL`, i.e., every subject started the follow-up at time 0.
`t_name`	Name of the variable in `cohort` or `ncc` for the time of event or censoring. A `string`. Note that if `ncc` is supplied, in order to correctly compute the weight for each sampled control this should be the actual time of censoring, not the time of event of the case in the same matched set.
`sample_stat`	A numeric vector containing sampling and status information for each subject in `cohort`: use 0 for non-sampled controls, 1 for sampled (and kept) controls, and integers >=2 for events. The length of this vector must be the same as the number of rows in `cohort`.
`t_match_name`	Name of the column of event time in each matched set in `ncc`, possibly coarsened to the same level as `t_event` in `risk_table_manual`. A `string`. Default is `t_name`, i.e., not coarsened.
`y_name`	Name of the column of censoring status in `cohort` or `ncc`, with 1 for event and 0 for censoring. A `string`.
`match_var_names`	Name(s) of the match variable(s) in `cohort` or `ncc` used when drawing the NCC. A `string` vector. Default is `NULL`, i.e., the NCC was only time-matched. The corresponding column in `risk_table_manual` must have the same name.
`n_per_case`	Number of controls matched to each case.
`return_risk_table`	Whether the risk table should be returned. Default is `FALSE`.
`km_names`	Column names for the KM-type probability (the first element) and weight (the second element) computed, if these two columns are to be attached to each subject in the input data. Default is `c("km_prob", "km_weight")`.

Details

When the full cohort is not available, in order to compute the correct weights for each sampled control in the NCC sample, it is important to keep the actual time of event or censoring of each subject in the NCC sample, which should be specified as t_name in the input. Since the number of subjects in each risk set will be supplied separately (i.e., as n_at_risk) in such scenario, t_match_name is required to map each control to the appropriate risk set. t_match_name may be the same as t_name if the exact risk set is available in n_at_risk, but when the full cohort is not available the risk set is usually approximated by using a coarsened version of t_name. For example, when controls were drawn from a population registry by matching on the exact date of death of cases, birth cohort and gender, the number at risk may be approximated by using the population size in the year of event in the same birth cohort of the same gender. In this scenario t_match_name would be the year of t_name.

Examples

library(SamplingDesignTools)
# Load mini cohort
data("mini_cohort")
mini_cohort
# Manually prepare a 1:1 NCC data
ncc <- rbind(
  data.frame(Set = 1, Map = c(1, 5), Time = mini_cohort$t[1], 
             Fail = c(1, 0), t = mini_cohort$t[c(1, 5)]), 
  data.frame(Set = 2, Map = c(3, 4), Time = mini_cohort$t[3], 
             Fail = c(1, 0), t = mini_cohort$t[c(3, 4)]), 
  data.frame(Set = 3, Map = c(4, 10), Time = mini_cohort$t[4], 
             Fail = c(1, 0), t = mini_cohort$t[c(4, 10)]), 
  data.frame(Set = 4, Map = c(6, 7), Time = mini_cohort$t[6], 
             Fail = c(1, 0), t = mini_cohort$t[c(6, 7)]), 
  data.frame(Set = 5, Map = c(8, 9), Time = mini_cohort$t[8], 
             Fail = c(1, 0), t = mini_cohort$t[c(8, 9)]), 
  data.frame(Set = 6, Map = c(9, 10), Time = mini_cohort$t[9], 
             Fail = c(1, 0), t = mini_cohort$t[c(9, 10)]) 
)
rownames(ncc) <- NULL
ncc
# Map the NCC sample to the original cohort, break the matching, identify the
# subjects selected into the NCC, and return this subset with KM type weights
# computed for them.
# First create the sampling and status indicator:
sample_stat <- numeric(nrow(mini_cohort))
sample_stat[unique(ncc$Map[ncc$Fail == 0])] <- 1
sample_stat[ncc$Map[ncc$Fail == 1]] <- 2
# Then find the sampled subset and compute weights:
ncc_nodup <- compute_km_weights(
  cohort = mini_cohort, t_name = "t", y_name = "status",
  sample_stat = sample_stat, n_per_case = 1
)
ncc_nodup
# Alternatively, if the cohort is not available, the weights can be computed
# as long as number of subjects at risk at event times in each strata is
# available elsewhere, and the actual time of event/censoring is available
# for each subject in the NCC.
# Compute the number of subjects at risk from mini_cohort:
risk_table <- compute_risk_table(cohort = mini_cohort, t_name = "t", 
                                 y_name = "status")
risk_table
# The following command computes the same weights as in ncc_nodup:
ncc_nodup_v2 <- compute_km_weights(
  ncc = ncc[, -1], risk_table_manual = risk_table,
  id_name = "Map", t_match_name = "Time", t_name = "t", y_name = "Fail",
  n_per_case = 1
)
ncc_nodup_v2

nyilin/SamplingDesignTools documentation built on Nov. 20, 2022, 8:07 a.m.