The 'xyz' Algorithm for Fast Interaction Search in High-Dimensional Data

The R-Package xyz contains an algorithm to estimate product interactions between a response vector $Y$ and predictors $X$. For example if all data were binary in ${-1,1}$ one would like to find the pair $(l,k)$ maximizing: $$ \begin{equation} \mathbb{P}(Y=X_l X_k). \end{equation} $$ A brute force approach would cycle through all possible pairs, which implies a quadratic run-time if there are $p$ variables. The xyz-Algorithm provably solves this problem in sub-quadratic runt-ime. Its run-time depends on the interaction strength of the strongest pair. xyz can recover pairs in almost linerar run-time if the interaction is strong.

Functions

The xyz package offers two functions for interaction search:

Examples

xyz_search:

source('vignettecode.R')
set.seed(1)
#set dimensions
n<-100;p<-1000;
#create data
X<-matrix(2*rbinom(n*p,1,0.5)-1,n,p)
Y<-X[,1]*X[,2]

#find top 10 interactions
result<-xyz_search(X,Y,L=5,N=10,binary=TRUE,negative=TRUE)
#the first element contains the interaction pairs the second element contains their strength
print(result)

These were now just L=10 runs. This means we discovered the interaction in about $\mathcal{O}(np^{1.005})$ operations instead of $\mathcal{O}(np^2)$.

xyz_regression:

#set dimensions
n<-100;p<-1000;
#create data
X<-matrix(rnorm(n*p),n,p)

#build model
Y<-3*X[,5]+2*X[,1]*X[,2]-3*X[,7]*X[,4]+rnorm(n)

#find top 10 interactions
result<-xyz_regression(X,Y,L=10,n_lambda=10,alpha=0.9)
#the first element contains the main effects and the third the interaction effects, we look at the fifth lambda
print(result)
plot(result)
#predict
predict(result,matrix(rnorm(10*p),10,p))


Try the xyz package in your browser

Any scripts or data that you put into this service are public.

xyz documentation built on May 2, 2019, 10:25 a.m.