In Netflix/sherlock: Causal Machine Learning for Segment Discovery and Analysis

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "README-"
)

`sherlock`

Causal Machine Learning for Population Segment Discovery and Analysis

Authors: Nima Hejazi and Wenjing Zheng

Causal Segmentation Analysis with `sherlock`

The sherlock R package implements an approach for population segmentation analysis (or subgroup discovery) using recently developed techniques from causal machine learning. Using data from randomized A/B experiments or observational studies (quasi-experiments), sherlock takes as input a set of user-selected candidate segment dimensions -- often, a subset of measured pre-treatment covariates -- to discover particular segments of the study population based on the estimated heterogeneity of their response to the treatment under consideration. In order to quantify this treatment response heterogeneity, the conditional average treatment effect (CATE) is estimated using a nonparametric, doubly robust framework [@vanderweele19; @vdL15; @Luedtke16a; @Luedtke16b], incorporating state-of-the-art ensemble machine learning [@vdl2007super; @coyle2021sl3] in the estimation procedure.

For background and details on using sherlock, see the package vignette and the documentation site. An overview of the statistical methodology is available in our conference manuscript [@hejazi2021framework] from CODE @ MIT 2021.

Installation

Install the most recent version from the master branch on GitHub via remotes:

remotes::install_github("Netflix/sherlock")

Issues

If you encounter any bugs or have any specific feature requests, please file an issue.

Citation

After using the sherlock R package, please cite the following:

    @software{netflix2021sherlock,
      author={Hejazi, Nima S and Zheng, Wenjing and {Netflix, Inc.}},
      title = {{sherlock}: Causal machine learning for segment discovery
        and analysis},
      year  = {2021},
      note = {R package version 0.2.0},
      doi = {10.5281/zenodo.5652010},
      url = {https://github.com/Netflix/sherlock}
    }

    @article{hejazi2021framework,
      author = {Hejazi, Nima S and Zheng, Wenjing and Anand, Sathya},
      title = {A framework for causal segmentation analysis with machine
        learning in large-scale digital experiments},
      year = {2021},
      journal = {Conference on Digital Experimentation at {MIT}},
      volume = {(8\textsuperscript{th} annual)},
      publisher = {MIT Press},
      url = {https://arxiv.org/abs/2111.01223}
    }