# Perform Prototype or F Tests for Significance of Groups of Predictors in the Univariate Model

### Description

Perform prototype or F tests for significance of groups of predictors in the univariate model. Choose either exact or approximate likelihood ratio prototype tests (ELR) or (ALR) or F test or marginal screening prototype test. Options for selective or non-selective tests. Further options for non-sampling or hit-and-run null reference distributions for selective tests.

### Usage

1 2 3 |

### Arguments

`x` |
input matrix of dimension |

`y` |
response variable. Vector of length emphn, assumed to be quantitative. |

`type` |
type of test to be performed. Can only select one at a time. Options include the exact and approximate likelihood ratio prototype tests of Reid et al (2015) (ELR, ALR), the F test and the marginal screening prototype test of Reid and Tibshirani (2015) (MS). Default is ELR. |

`selected.col` |
preselected columns specified by user. Vector of indices in the set {1, 2, ..., |

`lambda` |
regularisation parameter for the lasso fit. Must be supplied when |

`mu` |
mean parameter for the response. See Details below. If supplied, it is first subtracted from the response to yield a mean-zero (at the population level) vector for which we proceed with testing. If |

`sigma` |
error standard deviation for the response. See Details below. Must be supplied. If not, it is assumed to be 1. Required for the computation of some of the test statistics. |

`hr.iter` |
number of hit-and-run samples required in the reference distrbution of a selective test. Applies only if |

`hr.burn.in` |
number of burn-in hit-and-run samples. These are generated first so as to make subsequent hit-and-run realisations less dependent on the observed response. Samples are then discarded and do not inform the null reference distribution. |

`verbose` |
should progress be printed? |

`tol` |
convergence threshold for iterative optimisation procedures. |

### Details

The model underpinning each of the tests is

*\emph{y = mu + theta u_hat + epsilon}*

where *\emph{epsilon} is Gaussian with zero mean and variance \emph{sigma^2}* and *\emph{y_hat}* depends on the particular test considered.

In particular, for the ELR, ALR and F tests, we have *\emph{y_hat = P_M(y - mu)}*, where *\emph{X_MX_M^dagger}*. *\emph{X_M}* is the input matrix reduced to the columns in the set *M*, which, in turn, is either provided by the user (via `selected.col`

) or selected by the lasso (if `selected.col`

is `NULL`

). If the former, a non-selective test is performed; if the latter, a selective test is performed, with the restrictions *\emph{Ay <= b}*, as set out in Lee et al (2015).

For the marginal screening prototype (MS) test, *\emph{y_hat = x_j_star}* where *\emph{x_j}* is the *\emph{jth}* column of `x`

and *is the column of maximal marginal correlation with the response*.

All tests test the null hypothesis *H_0: \emph{theta = 0}*. Details of each are described in Reid et al (2015).

### Value

A list with the following four components:

`ts` |
The value of the test statistic on the observed data. |

`p.val` |
Valid p-value of the test. |

`selected.col` |
Vector with columns selected. If initially |

`y.hr` |
Matrix with hit-and-run replications of the response. If sampled selective test was not performed, this will be |

### Author(s)

Stephen Reid

### References

Reid, S. and Tibshirani, R. (2015) *Sparse regression and marginal testing using cluster prototypes*. http://arxiv.org/pdf/1503.00334v2.pdf. *Biostatistics \Sexpr[results=rd,stage=build]{tools:::Rd_expr_doi("10.1093/biostatistics/kxv049")}*

Reid, S., Taylor, J. and Tibshirani, R. (2015) *A general framework for estimation and inference from clusters of features*. Available online: http://arxiv.org/abs/1511.07839.

### See Also

`prototest.multivariate`

### Examples

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | ```
require (prototest)
### generate data
set.seed (12345)
n = 100
p = 80
X = matrix (rnorm(n*p, 0, 1), ncol=p)
beta = rep(0, p)
beta[1:3] = 0.1 # three signal variables: number 1, 2, 3
signal = apply(X, 1, function(col){sum(beta*col)})
intercept = 3
y = intercept + signal + rnorm (n, 0, 1)
### treat all columns as if in same group and test for signal
# non-selective ELR test with nuisance intercept
elr = prototest.univariate (X, y, "ELR", selected.col=1:5)
# selective F test with nuisance intercept; non-sampling
f.test = prototest.univariate (X, y, "F", lambda=0.01, hr.iter=0)
print (elr)
print (f.test)
### assume variables occur in 4 equally sized groups
num.groups = 4
groups = rep (1:num.groups, each=p/num.groups)
# selective ALR test -- select columns 21-25 in 2nd group; test for signal in 1st; hit-and-run
alr = prototest.multivariate(X, y, groups, 1, "ALR", 21:25, lambda=0.005, hr.iter=20000)
# non-selective MS test -- specify first column in each group; test for signal in 1st
ms = prototest.multivariate(X, y, groups, 1, "MS", c(1,21,41,61))
print (alr)
print (ms)
``` |