Survival analysis of packages
In inst/data
is sd.rds
and R data table of package data for survival analysis.
The data was snapshotted from CRAN via rsync
to inst/data/cran.csv
. Info on package life/death was inferred from there, and package file size also taken from there. The snapshot was taken at the date in now.rds
.
Dependency counts were got from pkgsearch::cran_event_history
.
To add new covariates, use the package names in sd
as the list of packages in the study, and if possible use now
as the time point. It may not be possible to roll-back outputs from APIs to that point so oh well.
Don't really want to get into the complications of time-varying covariates so have boiled everything down into single covariates. For file size and dependency counts I use an integrated average of a stair-step graph at each package version point, augmented on the right with either the date of death or the date of now
.
> library(survival)
> sd <- readRDS(system.file("data", "sd.rds", package="cransurv"))
> m <- coxph(Surv(life,event) ~ meansize+Depends+Imports + Suggests, sd)
> summary(m)
Call:
coxph(formula = Surv(life, event) ~ meansize + Depends + Imports +
Suggests, data = sd)
n= 17813, number of events= 2455
coef exp(coef) se(coef) z Pr(>|z|)
meansize 3.637e-08 1.000e+00 5.590e-09 6.507 7.68e-11 ***
Depends 7.401e-02 1.077e+00 1.103e-02 6.712 1.92e-11 ***
Imports 6.513e-03 1.007e+00 7.013e-03 0.929 0.353
Suggests -5.382e-02 9.476e-01 1.042e-02 -5.166 2.39e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.