README.md

cransurv

Survival analysis of packages

Data

In inst/data is sd.rds and R data table of package data for survival analysis.

The data was snapshotted from CRAN via rsync to inst/data/cran.csv. Info on package life/death was inferred from there, and package file size also taken from there. The snapshot was taken at the date in now.rds.

Dependency counts were got from pkgsearch::cran_event_history.

To add new covariates, use the package names in sd as the list of packages in the study, and if possible use now as the time point. It may not be possible to roll-back outputs from APIs to that point so oh well.

Don't really want to get into the complications of time-varying covariates so have boiled everything down into single covariates. For file size and dependency counts I use an integrated average of a stair-step graph at each package version point, augmented on the right with either the date of death or the date of now.

Fitting

> library(survival)
> sd <- readRDS(system.file("data", "sd.rds", package="cransurv"))
> m <- coxph(Surv(life,event) ~ meansize+Depends+Imports + Suggests, sd)
> summary(m)
Call:
coxph(formula = Surv(life, event) ~ meansize + Depends + Imports + 
    Suggests, data = sd)

  n= 17813, number of events= 2455 

               coef  exp(coef)   se(coef)      z Pr(>|z|)    
meansize  3.637e-08  1.000e+00  5.590e-09  6.507 7.68e-11 ***
Depends   7.401e-02  1.077e+00  1.103e-02  6.712 1.92e-11 ***
Imports   6.513e-03  1.007e+00  7.013e-03  0.929    0.353    
Suggests -5.382e-02  9.476e-01  1.042e-02 -5.166 2.39e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


barryrowlingson/cransurv documentation built on Feb. 6, 2020, 4:41 a.m.