README.md
In jiawen012/Survival-Analysis-NCT00364013: NCT00364013

Survival-Analysis-NCT00364013

This is a created R package for survival analysis of NCT00364013. To use this package please first library the package in console.

library(NCT00364013)

Structure of the document:

A description of the study
Load data and packages
Exploratory Data Analysis
Intuition about the survival theories
Kaplan-Meier Analytics
Cox proportional hazard models
Conclusion\

The primary purpose of the study is to assess whether panitumumab in combination with infusional 5-fluorouracil, leucovorin, and oxaliplatin (FOLFOX) chemotherapy improves progression-free survival (PFS) compared to FOLFOX alone as first-line therapy for metastatic colorectal cancer (mCRC) among subjects with wild-type KRAS tumors (subjects whose tumors contain nonmutated KRAS) and subjects with mutant KRAS tumors. In this study, participants are randomized in a 1:1 ratio to first-line therapy consisting of either:

• Arm 1(test): FOLFOX with panitumumab or

• Arm 2(standard): FOLFOX alone

Subjects will complete an EQ-5D every 4 weeks ± 1 week, starting at baseline and continuing while receiving assigned study treatment until disease progression and once at the safety followup visit. If withdrawal from assigned study treatment occurs prior to disease progression (eg due to unacceptable toxicities) subjects should continue to complete the EQ-5D every 8 weeks ± 1 week until disease progression. Researchers did primary analysis, safety interim analyses and efficacy interim analyses in this study to evaluate the combined drug efficiency.

To avoid repeated data manipulation, here I did not save all .sas7bat files in the package, so the code here is a little bit different from what we did in the homework 1. In conclusion, I just stored the cleaned and processed data. To see more about our data cleaning process, please review our homework 1.

summarize_table(data_cleaned)

 ATRTFOLFOX.alone ATRTPanitumumab...FOLFOX     dthdy         PRSURGN PRSURGY   dth         age       SEXFemale SEXMale                   trt      B_ECOG  B_ECOGFully.active
 0:468            0:467                    Min.   :  1.0      0:842   0: 93   0:245   Min.   : 1.00   0:579     0:356  FOLFOX alone        :466   0:934   0:415  
 1:467            1:468                    1st Qu.:176.5      1: 93   1:842   1:690   1st Qu.:29.00   1:356     1:579  Panitumumab + FOLFOX:469   1:  1   1:520 
                                           Median :323.0                              Median :36.00                                                          
                                           Mean   :331.4                              Mean   :34.98   
                                           3rd Qu.:494.0                              3rd Qu.:43.00                                                         
                                           Max.   :657.0                              Max.   :58.00                                                                                                            
 B_ECOGIn.bed.less.than.50..of.the.time   B_ECOGSymptoms.but.ambulatory pfscr      pfsdycr  
 0:889                                    0:567                         0:124   Min.   :  1.0  
 1: 46                                    1:368                         1:811   1st Qu.: 91.0 
                                                                                Median :186.0  
                                                                                Mean   :203.7 
                                                                                3rd Qu.:318.5  
                                                                                Max.   :466.0

From the brief summary we can see that:

61.93% of patients are women;
73.8% of patients died before the end; of the study, and the mean of survival period is 331.4;
Age ranges from 1 to 58 years old ;
90.05% of the subjects had a prior surgery ;
Patient status: 55.61% of the subjects are fully active, 5.92% of the subjects are in bed less than 50% of their time, and 39.36% of the subjects have symptoms but ambulatory.

To get a basic sense of the relationship among different variables, we drew the following graph.

plot_trt(tmp)

Figure_1

plot_panels(tmp)

Figure_2

We explored the relationship between the experimental group's and the control group's dthdy, race, age, weight, height and pfsdycr. The upper right part contains correlation values, graphs on the diagonal line is the data distribution histograms, and the lower left part shows the corresponding scatter plots. The red and purple points are the data of the experimental group and the control group respectively. From the figure, we could see that there is no correlation between other variables except for the dthdy and pfs.

Below we will begin to analyze the dthdy and pfs(progression-free survival) of the two groups.

plot_hist(tmp)

Figure_3

We noticed that there isn't clear difference in PFS days and death days between test or standard treatment groups. Also, we did not find any significant difference in death days between test or standard treatment groups.

State a concise hypothesis?

Our hypothesis is that:

Panitumumab with FOLFOX chemotherapy DOES NOT improve progression-free survival (PFS) and Overall Survival (OS) rate compared to FOLFOX alone as first-line therapy for metastatic colorectal cancer (mCRC) among subjects with wild-type KRAS tumors (subjects whose tumors contain nonmutated KRAS) and subjects with mutantKRAS tumors.

Here, we plan to use survival analysis tools to test our hypothesis as follows.

Survival analysis helps when the time element (an event occuring) is taken into account. We want to estimate the time to reach a specific event with survival analysis. In our study, if there is no obvious difference in estimated progression-free survival and overall survival signal between Arm1 and Arm2, then our hypothesis is TRUE.

The event is the endpoint in the study, the specific outcome we want to measure. In our study what we want to estimate is the death or survival situations among patients in arm1 or arm2.

This refers to incomplete data, the event did not occur for a subject during the time of the study, there are several cases:

Patient did not experience any event during the study, and we do not know if the event occured (for instance, we do not know if the patient ultimately survived or not after the study).
Right censored subjects: the patient withdrew from the trial or data is lost for some reason (follow-up didn't occur on the patient, or the patient experienced a different event).

All patients not experiencing the event during the time of the study will be censored at the latest recorded time point.

Main purpose: visualzing survival curves, works well with categorical data, for numerical data we'll use the Cox proportional hazard models.

Kaplan-Meier statistic measures the probability that a patient will survive past a specific point in time. At $t=0$ , the statistic is 1 (or 100%). When $t$ increases infinitely, the statistic becomes 0.

The plot of the KM estimator is a series of decreasing horizontal steps, approaching the true survival function.

The survival probability of surviving after time = t is noted $S(t$ ). It is the product of all the prior survival probabilities up to time = t.

Figure_4

The basic formula is:

Figure_5

$t(i$ ) is a time when at least an event happened, $d(i$ ) is the number of events (death or recurring disease for instance), that happened at time $t(i$ ), $n(i$ ) is the number of of individuals that survive (did not have an event or where not censored). Source: https://en.wikipedia.org/wiki/Kaplan%E2%80%93Meier_estimator

This is based on conditional probabilities, each new proportion conditional on the previous proportions.

This test is performed on the curves of the Kaplan-Meier method.\ For two (or more) different survival curves, it tests the null hypothesis that both curves are equal. If the p-value is below the significance level (generally $alpha=0.05$ ), then we have convincing statistical evidence that at least two curves differ.

Main purpose: describing the simultaneous effect of several variables on the rate of a particular event happening at a specific point in time.

The Cox model is defined with the hazard function, (http://latex.codecogs.com/svg.latex?h(t)): it describes the probability of a hazard of a subject to survive to time = $t$. The function returns a proportion, from 0 to 1.

It measures the instanteous risk of death. It has a memory-less property, the likelihood of something happening at time = $t$ has no relation to what happened in the past. The function at year y applies to all subjects alive that year, without taking into account who died in previous years.

Covariates are the predictors we use in the model that looks pretty much like a multiple linear regression.

Hazard function:

Figure_6

$t$ = survival time
$h(t$ ) = hazard function taking as arguments n covariates noted $x_{1}$ to $x_{n}$
$b_{1}$ , …, $b_{1}$ = coefficients or weights of each covariate
$h_{0}$ = baseline hazard, i.e. value of the hazard when all x's are equal to zero.
Hazard ratios $\log(b$ ) are noted HR: if $HR=1$ : no effect if $HR<1$ : reduction in hazard, we call the associated covariate a good prognostic factor if $HR>1$ : increase in hazard, we call the associated covariate a bad prognostic factor

The survivor function describes the probability of "not having an event", whereas the hazard function describes the probability of the event occurring.

The survival function is the probability that a subject survives from the time origin of the study to a specified future time $t$ .

The hazard function is the probability that a subject under observation at a time $t$ has an event at that time.

We want to fit a survival curve for time until the event on the x-axis and status on the y-axis, with an explanatory variable like gender, type of treatment, ...

We use the survfit() function from the survival package, in combination with the Surv() function, which provides a survival object containing failure time and censoring information. I encapsulated the code into a function named survival_fit, and here I directly use it.

In this first example, we use the treatment class (standard or test):

survival_fit(tmp)

       records n.max n.start events   *rmean *se(rmean) median         0.95LCL 0.95UCL
atrt=0     468   468     468    403 367.6323   15.04755    275  atrt=0     239     286
atrt=1     467   467     467    408 353.5541   14.58030    279  atrt=1     239     292

Call: survfit(formula = Surv(pfsdycr, pfscr) ~ atrt, data = tmp)

         n events median 0.95LCL 0.95UCL
atrt=0 468    403    275     239     286
atrt=1 467    408    279     239     292

The short summary above shows that test treatment group has 403 events and standard group has 408 events, with a median survival time of 275 days for the test treatment group and 279 days for the standard group.403 vs. 408 events, 275 vs. 279 pfs days, from which we almost cannot tell the difference in PFS and OS between test treatment group and standard treatment group!

However, we need to visualize our model and calculate the p-value to further illustrate and test our hypothesis, as shown in the following section.

We use the plot_survival function from my package to fit a cox model. This function calls ggsurvplot() from Survminer, which offers more options in just one plot.

plot_survival(survival_fit(tmp))

Figure_8 Figure_7

Interpretation:

The x-axis represents the survival time in days.
The y-axis shows the probability of survival time, related to the number of days on the x-axis.
Each event (death in this case) is shown by a vertical drop of the curve.\ Vertical ticks (although hardly noticeable on the plot) show a censored patient.
The curve always start at 1 (no events occured or in this case all patients are alive), then decreases and if the study would last infinitely, the curve would tend towards 0 (no subjects left due to event or censoring).
The p-value=0.76 in PFS estimation model, which is both extremely high, we fail to reject the null hypothesis that both curves are different.
In our hypothesis, there is no statistical evidence that both curves are different or, the treatment type doesn't impact the survival time, i.e. our hypothesis holds.
Like stated earlier and visible here, the median survival time (the point where survival probability is 0.5) for standard treatment is 279 days, against only 275 days for the test treatment. There is no statistical evidence that both curves are different.

We use the summarize_cox function from my package to fit a cox model. This function calls coxph() function from the survival package, can also contain some other data processing code.

Fit univariate model

summarize_cox(tmp)

Call:
coxph(formula = Surv(pfsdycr, pfscr) ~ atrt, data = tmp_d)

        coef exp(coef) se(coef)     z     p
atrt 0.04000   1.04081  0.07032 0.569 0.569

Likelihood ratio test=0.32  on 1 df, p=0.5695
n= 935, number of events= 811

Interpreting the summary output:

The "z" column gives the Wald statistic value (ratio of each regression coefficient to its standard error $z=coef/SE(coef$ ). The wald statistic evaluates, whether the beta coefficient of a given variable is statistically significantly different from 0. With $z=0.569$ and $p=0.569$ , type of treatment doesn't have any statistically significant coefficients.

It the regression coefficients (coef) has a positive sign, the hazard (risk of death) is higher, i.e. the prognosis is worse, for subjects with higher values of that variable. The output gives the hazard ratio (HR) for the second group relative to the first group, that is, test treatment versus standard. The beta coefficient for atrt = 0.04 indicates that test patients have higher risk of death (higher survival rates) than standard patients. But let's keep in mind that the p-value was high and the variable itself is not statistically significant.

Hazard ratios , that is the exponentiated coefficients noted 1.04, give the effect size of the covariates. For example, having test treatment increases the hazard by a factor of 1.04.

The global statistical significance of the model gives p-values for three tests: likelihood-ratio test, Wald test, and score logrank. These three methods test the null hypothesis that all beta va;ues are zero and are asymptotically equivalent. For large n, they return similar results. For small n, we use the likelihood ratio test.

After brief exploratory analysis using tables, visualizations and analysis under two survival analysis models. We can say that our hypothesis:

Panitumumab with FOLFOX chemotherapy DOES NOT improve progression-free survival (PFS) and Overall Survival (OS) rate compared to FOLFOX alone as first-line therapy for metastatic colorectal cancer (mCRC) among subjects with wild-type KRAS tumors (subjects whose tumors contain nonmutated KRAS) and subjects with mutantKRAS tumors.

is CORRECT.

jiawen012/Survival-Analysis-NCT00364013 documentation built on Dec. 21, 2021, 12:04 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

jiawen012/Survival-Analysis-NCT00364013
NCT00364013

README.md
In jiawen012/Survival-Analysis-NCT00364013: NCT00364013

Survival-Analysis-NCT00364013

Summary

1. A description of the study

2. Load data and packages

3. Exploratory data analysis

4. Survival analysis

4.1. Event

4.2. Censored Data

4.3 Kaplan-Meier

4.4. Log-Rank Test

4.5. Cox proportional hazard models

4.6. Difference between survival and hazard functions

5. Kaplan-Meier Analytics

5.1. Fitting a Kaplan-Meier Model

5.2. Visualizing the model

6. Cox proportional hazard models

6.1. Fitting a Cox Model

7. Conclusion

R Package Documentation

Browse R Packages

We want your feedback!

jiawen012/Survival-Analysis-NCT00364013 NCT00364013

README.md In jiawen012/Survival-Analysis-NCT00364013: NCT00364013

Survival-Analysis-NCT00364013

Summary

1. A description of the study

2. Load data and packages

3. Exploratory data analysis

4. Survival analysis

4.1. Event

4.2. Censored Data

4.3 Kaplan-Meier

4.4. Log-Rank Test

4.5. Cox proportional hazard models

4.6. Difference between survival and hazard functions

5. Kaplan-Meier Analytics

5.1. Fitting a Kaplan-Meier Model

5.2. Visualizing the model

6. Cox proportional hazard models

6.1. Fitting a Cox Model

7. Conclusion

R Package Documentation

Browse R Packages

We want your feedback!

jiawen012/Survival-Analysis-NCT00364013
NCT00364013

README.md
In jiawen012/Survival-Analysis-NCT00364013: NCT00364013