knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(rstan)
library(ggplot2)

In this example, Uber shows how they model the time when a promotion begins and a customer takes a second ride. The challenge to not bias data is that the promotion or that promotions monitoring can end at a hard deadline, and no longer monitored (say 30 days). They are trying to model a response variable time to second ride within a promotion period. When the 30 days end the promotion, a number of customers will have not appeared for a second ride. However, that is not to mean they will not ever take a second ride, in fact, with Probabilistic AI, they can model these right-censored data also known as suspensions without discrimination integrated into a linear model.

Probabilistic AI programming, in this author's opinion, appears to flip the way to approach modelling. Classical statistics teaches one direction, bayesian modelling with probAI appears to mix it up. At the end of the day, it's all about the MCMC generation of a posterior samples and inspection of the posterior distribition. The following text does a real good example a linear model. Linear model with right-censored data.

McElreath, Richard. Statistical rethinking: A Bayesian course with examples in R and Stan. Chapman and Hall/CRC, 2018.

In the uber example they want to see how variables a and b, in a linear relationship "can be sampled from a posterior". We even sample the input varibles too, just like linear regresion. For more details on the example:

https://eng.uber.com/modeling-censored-time-to-event-data-using-pyro/

Generate data

n = 50000
a = 2
b = 4
c = 8

data <- data.frame(x = numeric(n),
                   y = numeric(n),
                   y_obs = numeric(n),
                   observed = character(n))

data["x"] <- rnorm(n, 0, 0.34)

link <- 1/log(1+exp(a*data$x+b))

data["y"] <- rexp(n, rate = link)

data["observed"] <- data$y < c

data["y_obs"] <- data["y"]
data[data$observed == FALSE,"y_obs"] <- c

ggplot(data = data) +
  geom_point(aes(x = x, y = y, colour = "Actual"), colour = "blue") +
  geom_point(aes(x = x, y = y_obs, colour = "Observed"), colour = "turquoise") +
  geom_smooth(aes(x = x, y = y), method = lm, colour = "blue") +
  geom_smooth(data = subset(data, observed == TRUE), aes(x = x, y = y), method = lm, colour = "turquoise")

For info on fitting model to censored data see:

https://stackoverflow.com/questions/40289457/stan-using-target-syntax

Hamada p108 - explaination of censored data modelling

modelString = "
  data {
    int<lower=0> Nobs;
    int <lower=0> Ncen;
    vector[Nobs] xobs;
    vector[Ncen] xcen;
    vector[Nobs] yobs;
    vector[Ncen] ycen;
  }

  parameters {
    real model_a;
    real model_b;
  }

  model {
    vector[Nobs] linkobs;
    vector[Ncen] linkcen;

    linkobs = 1 ./ log(1 + exp(model_a*xobs + model_b));
    linkcen = 1 ./ log(1 + exp(model_a*xcen + model_b));

    target += exponential_lpdf(yobs | linkobs);
    target += exponential_lccdf(ycen | linkcen);

    model_a ~ normal(0,10);
    model_b ~ normal(0,10);
  }
"

uber_model <- rstan::stan_model(model_code = modelString)
x_obs <- data[data$observed == TRUE,"x"]
x_cen <- data[data$observed == FALSE,"x"]
y_obs <- data[data$observed == TRUE,"y_obs"]
y_cen <- data[data$observed == FALSE,"y_obs"]


stan_data_list <- list (Nobs = NROW(x_obs),
                        Ncen = NROW(x_cen),
                        xobs = x_obs,
                        xcen = x_cen,
                        yobs = y_obs,
                        ycen = y_cen
)

Fit model to data

output <- rstan::sampling(uber_model, data = stan_data_list, chains = 4, iter = 4000, control = list(adapt_delta = 0.9))

Plot histograms of posteriors for parameters a and b.

stan_hist(output, c("model_a", "model_b"))


a-segar/foocafeReliability documentation built on Feb. 28, 2020, 5:47 a.m.