survivalpath: Build Survival Path Model Using Dynamic Time Series Data...

View source: R/survivalpath.R

survivalpathR Documentation

Build Survival Path Model Using Dynamic Time Series Data (DTSD) object

Description

Survival Path Mapping for Dynamic Prediction of Cancer Patients Using Time-Series Survival Data.This is the core function that build survival path tree model based on Akaike information criterion (AIC) and self-designed arguments.

Usage

survivalpath(
DTSD,
time_slices,
treatments=NULL,
num_categories=2,
p.value=0.05,
minsample = 15,
degreeofcorrelation=0.7,
rates=365
)

Arguments

DTSD

A DTSD class object. See function generatorDTSD() for details.

time_slices

numeric, define the total number of time slices (starting from the front) needed to be included in the survival path model

treatments

A list object, with default value of NULL. This argument is used to specify the intervention measures/exposure taken by the observation at different time slices. The treatment or exposure variables specified will not be utilized in construction of the survival path model

num_categories

Numeric, the default value is 2. The maximum number of branches that each node can divide

p.value

p.value for hypothesis testing; variables with p value less than p.value in univariate analysis are significant candidate variables and will undergo further feature selection

minsample

Minimum sample size for branching

degreeofcorrelation

default 0.7;When the correlation between variables is greater than this value, the variables are considered to have collinearity. The pair of variables that exceed the correlation coefficient will automatically compare their Akaike information criterion (AIC) values when each of two serve as the only predictor for outcome; the variable with the smaller AIC value will be removed.

rates

Numeric value. Calculate the rate of the outcome for the nodes in the survival path model at the time point of the argument rates

Details

After the pre-processing of data, under a user-defined parameters on covariates, significance level, minimum bifurcation sample size and number of time slices for analysis, survival paths can be computed using the main function, which can be visualized as a tree diagram.

Value

The survivalpath function returns an object, which includes data, tree and df.

data

data describes the grouping variables and values for each observation at different time slices.

tree

A treedata object tree,which facilitate creation of tree diagram and mapping of patients' personalized survival path

df

A Data.frame object containing the node numbers corresponding to each observation at different time slices in survival path tree model tree. The dataframe added three new columns, the parent_node correspond to the upper node that the observation belongs to, which indicate the group of participants for modeling and feature selection; the sub_node indicates the node that the corresponding observation represent after subdivision from the parent_node, the information of sub_node is used for model evaluation and comparison. The variable_value indicate the reason for transfer from the parent_node to the sub_node.

maxpath

The longest path length in the survival path model.

Note

The idea of developing the SurvivalPath R package stems from our previous exploratory work, in which we attempted to achieve dynamic prognosis prediction by establishing survival paths based on the time-series data of patients with hepatocellular carcinoma (HCC). The survival path approach we proposed provide a potential solution for dynamic prognosis prediction and management of cancer patients by constructing survival path maps using returned key prognostic factors after analysis of structured time-series survival data. More importantly, the survival path model could be easily understood and utilized by clinicians when compared to black-box models. The SurvivalPath R package is a newly developed tool to facilitate fast building of survival path models, with an aim of promoting standardization of this methodology. In this package we optimized the feature selection process. Oneto one collinearity analysis was embedded (as an argument) to screen out noncollinear candidate variables before formal feature selection in the main function to reduces the confounding impact of potential collinearity on feature selection in the Cox model. In addition, the SurvivalPath R package is now compatible with continuous variable. The classifydata function enabling automatic binary classification of continuous variables and their entry into the model. This methodology is still young, and we welcome efforts from all the world to improve it.

Author(s)

Lujun Shen and Tao Zhang

References

Lujun Shen. (2018) Dynamically prognosticating patients with hepatocellular carcinoma through survival paths mapping based on time-series data, https://www.nature.com/articles/s41467-018-04633-7.pdf
Nat Commun. 2018 Jun 8;9(1):2230. doi: 10.1038/s41467-018-04633-7. PMID: 29884785; PMCID: PMC5993743.

Examples

library(dplyr)
data("DTSDHCC")
#Randomly select a proportion of cases for demo
id = DTSDHCC$ID[!duplicated(DTSDHCC$ID)]
set.seed(123)
id = sample(id,500)
miniDTSDHCC <- DTSDHCC[DTSDHCC$ID %in% id,]
#Convert multiple rows time series data into time-slices data
dataset = timedivision(miniDTSDHCC,"ID","Date",period = 90,left_interval = 0.5,right_interval=0.5)
#Create DTSD object using time-slices data
resu <- generatorDTSD(dataset,periodindex="time_slice",IDindex="ID" ,timeindex="OStime_day",
 statusindex="Status_of_death",variable =c( "Age", "Amount.of.Hepatic.Lesions",
 "Largest.Diameter.of.Hepatic.Lesions",
 "New.Lesion","Vascular.Invasion" ,"Local.Lymph.Node.Metastasis",
 "Distant.Metastasis" , "Child_pugh_score" ,"AFP"),predict.time=365*1)
#Construction of survival path using this function, takes minutes
result <- survivalpath(resu,time_slices =9)

#Draw Suvival Path Tree
library(ggplot2)
library(ggtree)
mytree <- result$tree

ggtree(mytree, color="black",linetype=1,size=1.2,ladderize = TRUE )+
 theme_tree2() +
 geom_text2(aes(label=label),hjust=0.6, vjust=-0.6 ,size=3.0)+
 geom_text2(aes(label=paste(node,size,mytree@data$survival,mytree@data$survivalrate,sep = "/")),
 hjust=0.6, vjust=-1.85 ,size=3.0)+
 #geom_point2(aes(shape=isTip, color=isTip), size=mytree1@data$os/40)+
 geom_point2(aes(shape=isTip, color=isTip), size=mytree@data$size%/%200+1,show.legend=FALSE)+
 #guides(color=guide_legend(title="node name/sample number/Median survival time/Survival rate")) +
 labs(size= "Nitrogen",
      x = "TimePoints",
      y = "Survival",
      subtitle = "node_name/sample number/Median survival time/Survival rate",
      title = "Survival Tree") +
 theme(legend.title=element_blank(),legend.position = c(0.1,0.9))


SurvivalPath documentation built on July 4, 2022, 1:05 a.m.