atlas: Association testing by combining several matching thresholds

Description Usage Arguments Value References Examples

View source: R/atlas.R

Description

Computes association test p-values from a generalized linear model for each considered threshold, and computes a p-value for the combination of all the envisioned thresholds through Fisher's method using perturbation resampling.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
atlas(
  match_prob,
  y,
  x,
  covar = NULL,
  thresholds = seq(from = 0.1, to = 0.9, by = 0.2),
  nb_perturb = 200,
  dist_family = c("gaussian", "binomial"),
  impute_strategy = c("weighted average", "best")
)

Arguments

match_prob

matching probabilities matrix (e.g. obtained through recordLink) of dimensions n1 x n2.

y

response variable of length n1. Only binary phenotypes are supported at the moment.

x

a matrix or a data.frame of predictors of dimensions n2 x p. An intercept is automatically added within the function.

covar

a matrix or a data.frame of variables to be adjusted on in the test of dimensions n3 x p. Default is NULL in which case there is no adjustment.

thresholds

a vector (possibly of length 1) containing the different threshold to use to call a match. Default is seq(from = 0.5, to = 0.95, by = 0.05).

nb_perturb

the number of perturbation used for the p-value combination. Default is 200.

dist_family

a character string indicating the distribution family for the glm. Currently, only 'gaussian' and 'binomial' are supported. Default is 'gaussian'.

impute_strategy

a character string indicating which strategy to use to impute x from the matching probabilities match_prob. Either "best" (in which case the highest probable match above the threshold is imputed) or "weighted average" (in which case weighted mean is imputed for each individual who has at least one match with a posterior probability above the threshold). Default is "weighted average".

Value

a list containing the following:

References

Zhang HG, Hejblum BP, Weber G, Palmer N, Churchill S, Szolovits P, Murphy S, Liao KP, Kohane I and Cai T, ATLAS: An automated association test using probabilistically linked health records with application to genetic studies, JAMIA, in press (2021). doi: 10.1101/2021.05.02.21256490.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#rm(list=ls())

n_sims <- 1#5000

mysim <- function(i){
 x <- matrix(ncol=2, nrow=99, stats::rnorm(n=99*2))
 #plot(density(rbeta(n=1000, 1,2)))
 match_prob <- matrix(rbeta(n=103*99, 1, 2), nrow=103, ncol=99)

 #y <- rnorm(n=103, mean = 1, sd = 0.5)
 #return(atlas(match_prob, y, x, dist_family="gaussian")$influencefn_pvals)
 y <- rbinom(n=103, size = 1, prob=0.5)
 return(atlas(match_prob, y, x, dist_family="binomial")$influencefn_pvals)
}
#res <- pbapply::pblapply(1:n_sims, mysim, cl = parallel::detectCores()-1)
res <- lapply(1:n_sims, mysim)

size <- sapply(1:(ncol(res[[1]])-2), 
              FUN = function(i){
           rowMeans(sapply(res, function(m){m[, i]<0.05}), na.rm = TRUE)
           }
)
rownames(size) <- rownames(res[[1]])
colnames(size) <- colnames(res[[1]])[-(-1:0 + ncol(res[[1]]))]
size

ludic documentation built on Aug. 18, 2021, 5:08 p.m.