compute_km_weights: Compute Kaplan-Meier type weights for (matched) nested...

View source: R/km_type_weights.R

compute_km_weightsR Documentation

Compute Kaplan-Meier type weights for (matched) nested case-control (NCC) sample

Description

Compute Kaplan-Meier type weights for (matched) nested case-control (NCC) sample

Usage

compute_km_weights(
  cohort = NULL,
  ncc = NULL,
  id_name = NULL,
  risk_table_manual = NULL,
  t_start_name = NULL,
  t_name = NULL,
  sample_stat = NULL,
  t_match_name = t_name,
  y_name = NULL,
  match_var_names = NULL,
  n_per_case,
  return_risk_table = FALSE,
  km_names = c("km_prob", "km_weight")
)

Arguments

cohort

Cohort data with at least the following information on each subject: start time (if not 0 for all subjects) and end time of follow-up, censoring status and matching variables (if any). A data.frame or a matrix with column names.

ncc

(Matched) NCC data, if cohort is not available. This data should not include the ID of each matched set, but should include the actual event/censoring time of each subject. A data.frame or a matrix with column names.

id_name

Name of the column indicating subject ID in ncc, if cohort is not available.

risk_table_manual

Number of subjects at risk at time of each cases in the NCC, if cohort is not available. A data.frame or a matrix with column names. See Details.

t_start_name

Name of the variable in cohort or ncc for the start time of follow-up. A string. Default is NULL, i.e., every subject started the follow-up at time 0.

t_name

Name of the variable in cohort or ncc for the time of event or censoring. A string. Note that if ncc is supplied, in order to correctly compute the weight for each sampled control this should be the actual time of censoring, not the time of event of the case in the same matched set.

sample_stat

A numeric vector containing sampling and status information for each subject in cohort: use 0 for non-sampled controls, 1 for sampled (and kept) controls, and integers >=2 for events. The length of this vector must be the same as the number of rows in cohort.

t_match_name

Name of the column of event time in each matched set in ncc, possibly coarsened to the same level as t_event in risk_table_manual. A string. Default is t_name, i.e., not coarsened.

y_name

Name of the column of censoring status in cohort or ncc, with 1 for event and 0 for censoring. A string.

match_var_names

Name(s) of the match variable(s) in cohort or ncc used when drawing the NCC. A string vector. Default is NULL, i.e., the NCC was only time-matched. The corresponding column in risk_table_manual must have the same name.

n_per_case

Number of controls matched to each case.

return_risk_table

Whether the risk table should be returned. Default is FALSE.

km_names

Column names for the KM-type probability (the first element) and weight (the second element) computed, if these two columns are to be attached to each subject in the input data. Default is c("km_prob", "km_weight").

Details

When the full cohort is not available, in order to compute the correct weights for each sampled control in the NCC sample, it is important to keep the actual time of event or censoring of each subject in the NCC sample, which should be specified as t_name in the input. Since the number of subjects in each risk set will be supplied separately (i.e., as n_at_risk) in such scenario, t_match_name is required to map each control to the appropriate risk set. t_match_name may be the same as t_name if the exact risk set is available in n_at_risk, but when the full cohort is not available the risk set is usually approximated by using a coarsened version of t_name. For example, when controls were drawn from a population registry by matching on the exact date of death of cases, birth cohort and gender, the number at risk may be approximated by using the population size in the year of event in the same birth cohort of the same gender. In this scenario t_match_name would be the year of t_name.

See Also

compute_risk_table

Examples

library(SamplingDesignTools)
# Load mini cohort
data("mini_cohort")
mini_cohort
# Manually prepare a 1:1 NCC data
ncc <- rbind(
  data.frame(Set = 1, Map = c(1, 5), Time = mini_cohort$t[1], 
             Fail = c(1, 0), t = mini_cohort$t[c(1, 5)]), 
  data.frame(Set = 2, Map = c(3, 4), Time = mini_cohort$t[3], 
             Fail = c(1, 0), t = mini_cohort$t[c(3, 4)]), 
  data.frame(Set = 3, Map = c(4, 10), Time = mini_cohort$t[4], 
             Fail = c(1, 0), t = mini_cohort$t[c(4, 10)]), 
  data.frame(Set = 4, Map = c(6, 7), Time = mini_cohort$t[6], 
             Fail = c(1, 0), t = mini_cohort$t[c(6, 7)]), 
  data.frame(Set = 5, Map = c(8, 9), Time = mini_cohort$t[8], 
             Fail = c(1, 0), t = mini_cohort$t[c(8, 9)]), 
  data.frame(Set = 6, Map = c(9, 10), Time = mini_cohort$t[9], 
             Fail = c(1, 0), t = mini_cohort$t[c(9, 10)]) 
)
rownames(ncc) <- NULL
ncc
# Map the NCC sample to the original cohort, break the matching, identify the
# subjects selected into the NCC, and return this subset with KM type weights
# computed for them.
# First create the sampling and status indicator:
sample_stat <- numeric(nrow(mini_cohort))
sample_stat[unique(ncc$Map[ncc$Fail == 0])] <- 1
sample_stat[ncc$Map[ncc$Fail == 1]] <- 2
# Then find the sampled subset and compute weights:
ncc_nodup <- compute_km_weights(
  cohort = mini_cohort, t_name = "t", y_name = "status",
  sample_stat = sample_stat, n_per_case = 1
)
ncc_nodup
# Alternatively, if the cohort is not available, the weights can be computed
# as long as number of subjects at risk at event times in each strata is
# available elsewhere, and the actual time of event/censoring is available
# for each subject in the NCC.
# Compute the number of subjects at risk from mini_cohort:
risk_table <- compute_risk_table(cohort = mini_cohort, t_name = "t", 
                                 y_name = "status")
risk_table
# The following command computes the same weights as in ncc_nodup:
ncc_nodup_v2 <- compute_km_weights(
  ncc = ncc[, -1], risk_table_manual = risk_table,
  id_name = "Map", t_match_name = "Time", t_name = "t", y_name = "Fail",
  n_per_case = 1
)
ncc_nodup_v2

nyilin/SamplingDesignTools documentation built on Nov. 20, 2022, 8:07 a.m.