matchRanges: Generate a covariate-matched control set of ranges

matchRangesR Documentation

Generate a covariate-matched control set of ranges

Description

matchRanges() uses a propensity score-based method to generate a covariate-matched control set of DataFrame, GRanges, or GInteractions objects.

Usage

matchRanges(focal, pool, covar, method = "nearest", replace = TRUE, ...)

## S4 method for signature 
## 'DF_OR_df_OR_dt,
##   DF_OR_df_OR_dt,
##   formula,
##   character_OR_missing,
##   logical_OR_missing'
matchRanges(focal, pool, covar, method, replace)

## S4 method for signature 
## 'GRanges,GRanges,formula,character_OR_missing,logical_OR_missing'
matchRanges(focal, pool, covar, method, replace)

## S4 method for signature 
## 'GInteractions,
##   GInteractions,
##   formula,
##   character_OR_missing,
##   logical_OR_missing'
matchRanges(focal, pool, covar, method, replace)

Arguments

focal

A DataFrame, GRanges, or GInteractions object containing the focal data to match.

pool

A DataFrame, GRanges, or GInteractions object containing the pool from which to select matches.

covar

A rhs formula with covariates on which to match.

method

A character describing which matching method to use. supported options are either 'nearest', 'rejection', or 'stratified'.

replace

TRUE/FALSE describing whether to select matches with or without replacement.

...

Additional arguments.

Details

Available inputs for focal and pool include data.frame, data.table, DataFrame, GRanges, or GInteractions. data.frame and data.table inputs are coerced to DataFrame objects and returned as MatchedDataFrame while GRanges and GInteractions objects are returned as MatchedGRanges or MatchedGInteractions, respectively.

Value

A covariate-matched control set of data.

Methodology

matchRanges uses propensity scores to perform subset selection on the pool set such that the resulting matched set contains similar distributions of covariates to that of the focal set. A propensity score is the conditional probability of assigning an element (in our case, a genomic range) to a particular outcome (Y) given a set of covariates. Propensity scores are estimated using a logistic regression model where the outcome Y=1 for focal and Y=0 for pool, over the provided covariates covar.

Matching methods

  • method = 'nearest': Nearest neighbor matching with replacement. Finds the nearest neighbor by using a rolling join with data.table. Matching without replacement is not currently supported.

  • method = 'rejection': (Default) Rejection sampling with or without replacement. Uses a probability-based approach to select options in the pool that match the focal distribition.

  • method = 'stratified': Iterative stratified sampling with or without replacement. Bins focal and pool propensity scores by value and selects matches within bins until all focal items have a corresponding match in pool.

References

matchRanges manuscript:

Eric S. Davis, Wancen Mu, Stuart Lee, Mikhail G. Dozmorov, Michael I. Love, Douglas H. Phanstiel. 2023. "matchRanges: Generating null hypothesis genomic ranges via covariate-matched sampling." Bioinformatics. doi: 10.1093/bioinformatics/btad197

Examples

## Match with DataFrame
set.seed(123)
x <- makeExampleMatchedDataSet(type = 'DataFrame')
matchRanges(focal = x[x$feature1,],
            pool = x[!x$feature1,],
            covar = ~feature2 + feature3)

## Match with GRanges
set.seed(123)
x <- makeExampleMatchedDataSet(type = "GRanges")
matchRanges(focal = x[x$feature1,],
            pool = x[!x$feature1,],
            covar = ~feature2 + feature3)

## Match with GInteractions
set.seed(123)
x <- makeExampleMatchedDataSet(type = "GInteractions")
matchRanges(focal = x[x$feature1,],
            pool = x[!x$feature1,],
            covar = ~feature2 + feature3)

## Nearest neighbor matching with replacement
set.seed(123)
x <- makeExampleMatchedDataSet(type = 'DataFrame')
matchRanges(focal = x[x$feature1,],
            pool = x[!x$feature1,],
            covar = ~feature2 + feature3,
            method = 'nearest',
            replace = TRUE)

## Rejection sampling without replacement
set.seed(123)
x <- makeExampleMatchedDataSet(type = 'DataFrame')
matchRanges(focal = x[x$feature1,],
            pool = x[!x$feature1,],
            covar = ~feature2 + feature3,
            method = 'rejection',
            replace = FALSE)

## Stratified sampling without replacement
set.seed(123)
x <- makeExampleMatchedDataSet(type = 'DataFrame')
matchRanges(focal = x[x$feature1,],
            pool = x[!x$feature1,],
            covar = ~feature2 + feature3,
            method = 'stratified',
            replace = FALSE)


nullranges/nullranges documentation built on Aug. 29, 2023, 12:13 a.m.