Home

/

GitHub

/

smolnos/pulver

/

pulverize: Computes p-values for interaction terms from linear...

pulverize: Computes p-values for interaction terms from linear...
In smolnos/pulver: Parallel Ultra-Rapid p-Value Computation for Linear Regression Interaction Terms

Description Usage Arguments Details Value References Examples

View source: R/pulver.R

Given matrices ymat, xmat, and zmat, pulverize evaluates every linear regression

y = b0 + b1 x + b2 z + b3 xz

where y, x, and z are columns of ymat, xmat, and zmat, respectively, and returns the p-value for the null hypothesis b3=0. For example, if ymat, xmat, and zmat have 200, 300, and 400 columns, then pulverize evaluates 24 million models (200 * 300 * 400) and returns the p-value for the interaction term for each model.

1
2
3

pulverize(ymat, xmat, zmat, output_file = NULL, colnames = c("y", "x", "z"),
  pvalue_threshold = NULL, cores = 1L, overwrite = FALSE,
  suppress_return = FALSE, .fp = NULL)

`ymat, xmat, zmat`	Matrices of type "double", number of rows for all matrices are the number of observations and must have the same size
`output_file`	Output file if `NULL` no output file will be created (default: `NULL`)
`colnames`	Column names for result table (default: "y", "x", and "z")
`pvalue_threshold`	Report only p-values below threshold (default: `NULL`, i.e., report all p-values)
`cores`	Number of cores to use for parallelization (default: 1)
`overwrite`	If `TRUE` overwrite `output_file` (default: `FALSE`)
`suppress_return`	If `TRUE` return `NULL` instead of a data frame with p-values for the interaction term (default: `FALSE`)
`.fp`	File pointer, only for internal use. Necessary for the function `pulverize_all` to join results from different input files to the same output file (dafault: `NULL`)

For reasons of computational efficiency, pulverize returns only p-values and only for the interaction term xz. Fast run time is achieved by using the correlation coefficient between the outcome variable y and the interaction term xz to test the null-hypothesis, which avoids the costly computation of inversions. Additional employed time-saving operations are a rearrangement of the order when iterating through the different matrices, and implementing the core algorithm in the fast programming language C++.

Once interesting models are identified based on the resulting p-values, the number of models will be greatly reduced. At this point additional model characteristics, e.g. effect estimates and standard errors, can be obtained via traditional methods such as R's lm function.

Matrices ymat, xmat, and zmat must have column names. Missing values are imputed using their column means. If pvalue_threshold is supplied, only p-values (p < pvalue_threshold) strictly below the threshold are included in the returned data frame and saved in output_file.

In cases where the resulting data frame would be too large to fit in memory, it is possible to write the results to output_file without returning a data frame by setting suppress_return to TRUE.

The column names of the results table are by default set to "y", "x", and "z", but can be changed using the colnames argument.

An error will be signaled if output_file already exists. Setting overwrite to TRUE will silently overwrite the file.

The computation can be parallelized by specifying a number of cores greater than 1. By default only a single CPU is used. Note that parallelization is only supported in environments with C/C++ compilers that support OpenMP.

If suppress_return is FALSE, a data frame with columns "y", "x", "z", and "pvalue" containing the p-values for the interaction term xz of the above linear model is returned. Otherwise NULL is returned.

Shabalin, Andrey A (2012) Bioinformatics: Matrix eQTL: ultra fast eQTL analysis via large matrix operations, Oxford Univ Press, *28*, 1353-1358

nobs <- 100
y <- matrix(rnorm(nobs * 2), ncol = 2, dimnames = list(paste0("row", 1:nobs),
paste0("column", 1:2)))
x <- matrix(rnorm(nobs * 3), ncol = 3, dimnames = list(paste0("row", 1:nobs),
paste0("column", 1:3)))
z <- matrix(rnorm(nobs * 4), ncol = 4, dimnames = list(paste0("row", 1:nobs),
paste0("column", 1:4)))
pulverize(y, x, z)

smolnos/pulver documentation built on May 30, 2019, 5:03 a.m.

smolnos/pulver index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

smolnos/pulver
Parallel Ultra-Rapid p-Value Computation for Linear Regression Interaction Terms

pulverize: Computes p-values for interaction terms from linear...
In smolnos/pulver: Parallel Ultra-Rapid p-Value Computation for Linear Regression Interaction Terms

Description

Usage

Arguments

Details

Value

References

Examples

Related to pulverize in smolnos/pulver...

R Package Documentation

Browse R Packages

We want your feedback!

smolnos/pulver Parallel Ultra-Rapid p-Value Computation for Linear Regression Interaction Terms

pulverize: Computes p-values for interaction terms from linear... In smolnos/pulver: Parallel Ultra-Rapid p-Value Computation for Linear Regression Interaction Terms

Description

Usage

Arguments

Details

Value

References

Examples

Related to pulverize in smolnos/pulver...

R Package Documentation

Browse R Packages

We want your feedback!

smolnos/pulver
Parallel Ultra-Rapid p-Value Computation for Linear Regression Interaction Terms

pulverize: Computes p-values for interaction terms from linear...
In smolnos/pulver: Parallel Ultra-Rapid p-Value Computation for Linear Regression Interaction Terms