```
# Install sml
library(devtools);
install_github('linxihui/sml');
```

Cox's model assumes that the proportional hazards are proportional among all observations. With such assumption, a partial likelihood comes up, which can also be treated as a profile and conditional likelihood, which does not involve a baseline hazard function, and thus parameters of interest gain more degree freedom to estimate.

The Cox partial likelihood looks like

where $\theta_i$'s are usually called *links*, or *log risk* (as $\exp{\theta_i}$'s are called *risk*).

In Cox's model,

Gaussian process regression and classification have been shown to be powerful. However, no extension to survival is available in R. However, the extension is quite straight forward, by assuming that $\theta$ are sampled from a Gaussian process (prior), i.e.,

where $K = (k_{ij}) = K(x_i, x_j)$ is the kernel matrix.

The posterior distribution,

Define

After taking the logarithm, we get

The max-a-posterior(MAP) estimate $\hat\alpha_{\text{\tiny MAP}}$ (and thus $\hat\theta_{\text{\tiny MAP}}$) can be estimated by ridge regression solver (such as *glmnet*). This is implemented as *gpsrc*.

```
# Survival Gaussian process regression
library(survival);
library(sml);
set.seed(123);
data(pbc, package = 'randomForestSRC');
pbc <- na.omit(pbc);
i.tr <- sample(nrow(pbc), 100);
gp <- gpsrc(Surv(days, status) ~., data = pbc[i.tr, ], kernel = 'laplacedot');
gp.pred <- predict(gp, pbc[-i.tr, ]);
gp.cInd <- survConcordance(Surv(days, status) ~ gp.pred, data = pbc[-i.tr, ]);
cat('C-index:', gp.cInd$concordance, '\n');
```

```
## C-index: 0.8492154
```

Extreme learning machine (ELM) was first introduced as a quick method to train a single hidden layer neural network. However, ELM is more related to SVM. Indeed, neural network trained to learn the hidden features and SVM maps input to high dimension feature space which is determinant. Instead, ELM randomly maps input to high dimension feature space. This random feature mapping turns out to work well.

Denote $m$ as number of hidden features, $z_1, \cdots, z_m$, where $z_k = \sigma(x_i^T w_{ij})$, where $w_{ij}$ is randomly and independently sampled and $\sigma$ is a chosen activating function. A Cox model, or a ridge-regularized Cox model is then applied on the hidden features. Usually, $m$ is chosen to be big, probably even bigger than the number of sample $n$, like 500 or 1000 and ridge regulaization applied.

The question is, why random mapping even works? Since usually $m > p$ (input dimension) and $w_{ij}$ are independently sampled, thus

almost surely. However, after applying a non-linear activating function,

```
# Survival ELM
elm.f <- elm(Surv(days, status) ~., data = pbc[i.tr, ], nhid = 500);
elm.pred <- predict(object = elm.f, newdata = pbc[-i.tr, ], type = 'link');
elm.cInd <- survConcordance(Surv(days, status) ~ elm.pred, data = pbc[-i.tr, ]);
cat('C-index:', elm.cInd$concordance, '\n');
```

```
## C-index: 0.839515
```

Let $K = (k_{ij}) = K(x_i, x_j)$ be the kernel matrix. The weighted Kaplan-Meier estimate for survival function is

The weighted Nelson-Aalen estimator for the cummulative hazard function is

```
# Kernel-weighted KNN Kaplan-Meier
kkm.pred <- kkm(
Surv(days, status) ~., data = pbc[i.tr, ], xtest = pbc[-i.tr, ],
kernel = 'laplacedot', kpar = list(sigma = 0.1), k = 50
);
kkm.cInd <- survConcordance(
Surv(days, status) ~ I(1 - kkm.pred$test.predicted.survival[, 30]),
data = pbc[-i.tr, ]
);
cat('C-index:', kkm.cInd$concordance, '\n');
plot(kkm.pred, subset = sample(length(i.tr), 10), lwd = 2);
```

```
## C-index: 0.8446505
```

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.