forward_test: Forward inclusion tests with latent factor mixed models
In cayek/MatrixFactorizationR: Latent Factor Mixed Models

Description Usage Arguments Details Value Author(s) Examples

This function tests for association between each column of the response matrix, Y, and the explanatory variables, X, by recursively conditioning on the top hits in the set of explanatory variables. The conditional tests are based on LFMMs with ridge penalty.

1 2	forward_test(Y, X, K, niter = 5, scale = FALSE, candidate.list = NULL, rev.confounder = TRUE, lambda = 1e-05)

`Y`	a response variable matrix with n rows and p columns. Each column is a response variable (numeric).
`X`	an explanatory variable matrix with n rows and d = 1 column (eg. phenotype).
`K`	an integer for the number of latent factors in the regression model.
`niter`	an integer value for the number of forward inclusion tests.
`scale`	a boolean value, `TRUE` if the explanatory variable, X, is scaled (recommended option).
`candidate.list`	a vector of integers corresponding to response variables (columns in Y), which are known candidates for association. If `NULL`, a list of candidates is built in during the algorithm run.
`rev.confounder`	a boolean value. If `TRUE` confounders are revaluated in each conditional test. May take some time (default = `TRUE`).
`lambda`	a numeric value for the regularization parameter.

The response variable matrix Y and the explanatory variable are centered.

a list with the following attributes:

candidates a vector of niter response variables (column labels in Y) detected as top hits in each conditional association analysis.
log.p a vector of uncorrected log p-values for checking that the algorithm behaves well (but not trustable for testing).

cayek, francoio

library(lfmm)
data("example.data")
Y <- example.data$genotype
X <- example.data$phenotype #scaled variable

## fits an LFMM, i.e, computes B, U, V:
mod.lfmm <- lfmm_ridge(Y = Y,
                       X = X, 
                       K = 6)
                       
## performs initial association testing using the fitted model:
pv <- lfmm_test(Y = Y, 
                X = X,
                lfmm = mod.lfmm,
                calibrate = "gif")
## Manhattan plot 
plot(-log10(pv$calibrated.pvalue), 
      pch = 19, 
      cex = .2,
      col = "grey")
      
## Start forward tests (3 iterations)       
obj <- forward_test(Y, 
                    X, 
                    K = 6, 
                    niter = 3, 
                    scale = TRUE)

## Record Log p.values for the 3 top hits
log.p <-  obj$log.p
log.p

## Check perfect hits for each causal SNPs (labelled from 1 to 20)
obj$candidate %in% example.data$causal.set

## Check for candidates at distance 20 SNPs (about 10kb)
theta <- 20
## Number of hits for each causal SNPs (1-20)
 hit.3 <- as.numeric(
          apply(sapply(obj$candidate, 
          function(x) abs(x - example.data$causal.set) < theta), 
          2, 
          which))
## Number of hits for each causal SNPs (1-20) 
table(hit.3)


## Continue forward tests (2 additional iterations)       
obj <- forward_test(Y, 
                    X, 
                    K = 6, 
                    niter = 2,
                    candidate.list = obj$candidates,
                    scale = TRUE)

## Record Log p.values for all 5 top hits
log.p <-  c(log.p, obj$log.p)
log.p

## Check perfect hits for each causal SNPs (labelled from 1 to 20)
obj$candidate %in% example.data$causal.set

## Check for candidates at distance 5 SNPs (about 2.5kb)
theta <- 5
## Number of hits for each causal SNPs (1-20)
 hit.5 <- as.numeric(
          apply(sapply(obj$candidate, 
          function(x) abs(x - example.data$causal.set) < theta), 
          2, 
          which))
## Number of hits for each causal SNPs (1-20)          
table(hit.5)

## Plot log P
plot(log.p, xlab = "Conditional test iteration", ylab="Top hit log(p)")