Description Usage Arguments Details Value See Also Examples
Fit a generalized linear model via penalized maximum likelihood and cross-validation. Then, compute the difference statistic
W_j = |Z_j| - |\tilde{Z}_j|
where Z_j and \tilde{Z}_j are the coefficient estimates for the jth variable and its knockoff, respectively. The value of the regularization parameter λ is selected by cross-validation and computed with glmnet.
1 2 | MFKnockoffs.stat.glmnet_coef_difference(X, X_k, y, family = "gaussian",
cores = 2, ...)
|
X |
original design matrix (size n-by-p) |
X_k |
knockoff matrix (size n-by-p) |
y |
response vector (length n). Quantitative for family="gaussian", or family="poisson" (non-negative counts). For family="binomial" should be either a factor with two levels, or a two-column matrix of counts or proportions (the second column is treated as the target class; for a factor, the last level in alphabetical order is the target class). For family="multinomial", can be a nc>=2 level factor, or a matrix with nc columns of counts or proportions. For either "binomial" or "multinomial", if y is presented as a vector, it will be coerced into a factor. For family="cox", y should be a two-column matrix with columns named 'time' and 'status'. The latter is a binary variable, with '1' indicating death, and '0' indicating right censored. The function Surv() in package survival produces such a matrix. For family="mgaussian", y is a matrix of quantitative responses. |
family |
Response type (see above) |
cores |
Number of cores used to compute the knockoff statistics by running cv.glmnet. Unless otherwise specified, the number of cores is set equal to two (if available). |
... |
additional arguments specific to 'cv.glmnet' (see Details) |
This function uses the glmnet
package to fit a generalized linear model
via penalized maximum likelihood.
The knockoff statistics W_j are constructed by taking the difference between the coefficient of the j-th variable and its knockoff.
By default, the value of the regularization parameter is chosen by 10-fold cross-validation.
The default response family is 'gaussian', for a linear regression model. Different response families (e.g. 'binomial') can be specified by passing an optional parameter 'family'.
The optional nlambda
parameter can be used to control the granularity of the
grid of λ's. The default value of nlambda
is 100
,
where p
is the number of columns of X
.
If the family is 'binomial' and a lambda sequence is not provided by the user, this function generates it on a log-linear scale before calling 'glmnet'.
For a complete list of the available additional arguments, see cv.glmnet and glmnet.
A vector of statistics W (length p)
Other statistics for knockoffs: MFKnockoffs.stat.forward_selection
,
MFKnockoffs.stat.glmnet_lambda_difference
,
MFKnockoffs.stat.lasso_coef_difference_bin
,
MFKnockoffs.stat.lasso_coef_difference
,
MFKnockoffs.stat.lasso_lambda_difference_bin
,
MFKnockoffs.stat.lasso_lambda_difference
,
MFKnockoffs.stat.random_forest
,
MFKnockoffs.stat.sqrt_lasso
,
MFKnockoffs.stat.stability_selection
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | p=100; n=200; k=15
mu = rep(0,p); Sigma = diag(p)
X = matrix(rnorm(n*p),n)
nonzero = sample(p, k)
beta = 3.5 * (1:p %in% nonzero)
y = X %*% beta + rnorm(n)
knockoffs = function(X) MFKnockoffs.create.gaussian(X, mu, Sigma)
# Basic usage with default arguments
result = MFKnockoffs.filter(X, y, knockoffs=knockoffs,
statistic=MFKnockoffs.stat.glmnet_coef_difference)
print(result$selected)
# Advanced usage with custom arguments
foo = MFKnockoffs.stat.glmnet_coef_difference
k_stat = function(X, X_k, y) foo(X, X_k, y, nlambda=200)
result = MFKnockoffs.filter(X, y, knockoffs=knockoffs, statistic=k_stat)
print(result$selected)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.