Sparse.graph: Graphic Modeling Using LASSO-Type Sparse Learning Algorithm.

Description Usage Arguments Details Value References Examples

View source: R/Sparse.graph.R

Description

This function builds a gaussian or binary graph based on the bootstrap ranking LASSO regression method.

Usage

1
2
Sparse.graph(x, graph.type = c("gaussian"), B = 5, Boots = 100, edge.rule = c("AND"), 
              kfold = 10, plot = TRUE, seed = 0123)

Arguments

x

input matrix. The dimension of the matrix is nobs x nvars; each row is a vector of observations of the variables. Gaussian or binary data is supported.

graph.type

the type of gaussian or binary graph. Defaults to gaussian.

B

the number of external loop for intersection operation. Defaults to 5.

Boots

the number of internal loop for bootstrap sampling. Defaults to 100.

edge.rule

the rule indicating whether the AND-rule or the OR-rule should be used to define the edges in the graph. Defaults to AND.

kfold

the number of folds of cross validation - default is 10. Although kfold can be as large as the sample size (leave-one-out CV), it is not recommended for large datasets. Smallest value allowable is kfold=3.

plot

logical. Should the resulting graph be plotted? Defaults to TRUE.

seed

the seed for random sampling, with the default value 0123.

Details

This graph estimation procedure Sparse.graph, which is based on the L1-regularized regression model, combines a bootstrap ranking strategy with model selection using the glmnet algorithm. The glmnet algorithm fits LASSO model paths for linear and logistic regression using coordinate descent. Thus, the Sparse.graph procedure identifies relevant relationships between gaussian and binary variables, and assesses network structures from data. The resulting graph consists of variables as nodes and relevant relationships as edges. The combination of the LASSO penalized regression model and a bootstrap ranking strategy demonstrates a higher power and a lower false positive rate during variable selection, and is proposed to identify significant association between variables in epidemiological analysis.

Value

adj.matrix

the adjacency matrix.

graph.type

the type of graph. Currently, this procedure is supported for gaussian and binary data.

B

the number of external loop for intersection operation.

Boots

the number of internal loop for bootstrap sampling.

edge.rule

the rule used to define the edges in the graph.

References

[1] Guo, P., Zeng, F., Hu, X., Zhang, D., Zhu, S., Deng, Y., & Hao, Y. (2015). Improved Variable Selection Algorithm Using a LASSO-Type Penalty, with an Application to Assessing Hepatitis B Infection Relevant Factors in Community Residents. PLoS One, 27;10(7):e0134151.

[2] Friedman, J., Hastie, T. and Tibshirani, R. (2008). Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 1-22.

[3] Strobl, R., Grill, E., Mansmann, U. (2012). Graphical modeling of binary data using the LASSO: a simulation study. BMC Medical Research Methodology, 12:16.

[4] Meinshausen, N., Buehlmann, P. (2006). High-dimensional graphs and variable selection with the Lasso. Ann Stat, 34:1436-1462.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# Example 1: Gene network estimation using the bootstrap ranking LASSO method.
# Gaussian graph with OR-rule.
library(SIS)
data(leukemia.train)
# Genes screened by the LASSO algorithm as candidates for graphical modeling.
x <- as.matrix(leukemia.train[, -7130])
y <- as.numeric(leukemia.train[, 7130])
set.seed(0123)
cvfit <- cv.glmnet(x=x, y=y, type.measure="deviance", nfolds=3, family="binomial")
model.final <- cvfit$glmnet.fit
nzero <- as.matrix(coef(model.final, s=cvfit$lambda.min))
# To reduce the running time, only half of significant genes are shown.
var_nz <- sort(abs(nzero[nzero[,1]!=0, ][-1]), decreasing=TRUE)
var_nz <- names(var_nz[1:(length(var_nz)/2)])
sub_data <- leukemia.train[, c(var_nz, "V7130")]
# Gene expression data subset from patients with acute myeloid leukemia.
subset_1 <- subset(sub_data, sub_data$V7130==1)
subset_1 <- as.matrix(subset_1[, -dim(subset_1)[2]])
# The parameters of B and Boots in the following example are set as small values to
# reduce the running time, however the default values are proposed.
Sparse.graph.fit1 <- Sparse.graph(subset_1, graph.type=c("gaussian"), 
                                   B=2, Boots=1, edge.rule=c("OR"))
# Give out the adjacency matrix of variables.
Sparse.graph.fit1$adj.matrix

# Example 2: Gaussian graph with AND-rule.
# The parameters of B and Boots in the following example are set as small values to
# reduce the running time, however the default values are proposed.
Sparse.graph.fit2 <- Sparse.graph(subset_1, graph.type=c("gaussian"), 
                        B=2, Boots=1, edge.rule=c("OR"), plot=FALSE)
# Give out the adjacency matrix of variables.
Sparse.graph.fit2$adj.matrix
# Plot the graph based on the adjacency matrix of variables using the qgraph package.
library(qgraph)
qgraph(Sparse.graph.fit2$adj.matrix, directed=FALSE, color="blue", 
        negCol="red", edge.labels=TRUE, layout="circle")

Example output

Loading required package: glmnet
Loading required package: Matrix
Loading required package: foreach
Loaded glmnet 2.0-16

Warning message:
In lognet(x, is.sparse, ix, jx, y, weights, offset, alpha, nobs,  :
  one multinomial or binomial class has fewer than 8  observations; dangerous ground
Step 1-Current iteration:  1 
Step 2-Current iteration:  1 
Step 1-Current iteration:  1 
Step 2-Current iteration:  2 
Iteration =  1 
Step 1-Current iteration:  1 
Step 2-Current iteration:  1 
Step 1-Current iteration:  1 
Step 2-Current iteration:  2 
Iteration =  2 
Step 1-Current iteration:  1 
Step 2-Current iteration:  1 
Step 1-Current iteration:  1 
Step 2-Current iteration:  2 
Iteration =  3 
Step 1-Current iteration:  1 
Step 2-Current iteration:  1 
Step 1-Current iteration:  1 
Step 2-Current iteration:  2 
Iteration =  4 
Step 1-Current iteration:  1 
Step 2-Current iteration:  1 
Step 1-Current iteration:  1 
Step 2-Current iteration:  2 
Iteration =  5 
Step 1-Current iteration:  1 
Step 2-Current iteration:  1 
Step 1-Current iteration:  1 
Step 2-Current iteration:  2 
Iteration =  6 
Step 1-Current iteration:  1 
Step 2-Current iteration:  1 
Step 1-Current iteration:  1 
Step 2-Current iteration:  2 
Iteration =  7 
There were 50 or more warnings (use warnings() to see the first 50)
      V461    V2001 V1846     V5039     V1834 V3847     V3320
V461     0 1.294008     0 -2.293668 0.0000000     0 0.0000000
V2001    0 0.000000     0 -1.584952 0.0000000     0 0.0000000
V1846    0 0.000000     0  0.000000 0.5285677     0 0.0000000
V5039    0 0.000000     0  0.000000 0.0000000     0 0.0000000
V1834    0 0.000000     0  0.000000 0.0000000     0 0.2763067
V3847    0 0.000000     0  0.000000 0.0000000     0 0.0000000
V3320    0 0.000000     0  0.000000 0.0000000     0 0.0000000
Step 1-Current iteration:  1 
Step 2-Current iteration:  1 
Step 1-Current iteration:  1 
Step 2-Current iteration:  2 
Iteration =  1 
Step 1-Current iteration:  1 
Step 2-Current iteration:  1 
Step 1-Current iteration:  1 
Step 2-Current iteration:  2 
Iteration =  2 
Step 1-Current iteration:  1 
Step 2-Current iteration:  1 
Step 1-Current iteration:  1 
Step 2-Current iteration:  2 
Iteration =  3 
Step 1-Current iteration:  1 
Step 2-Current iteration:  1 
Step 1-Current iteration:  1 
Step 2-Current iteration:  2 
Iteration =  4 
Step 1-Current iteration:  1 
Step 2-Current iteration:  1 
Step 1-Current iteration:  1 
Step 2-Current iteration:  2 
Iteration =  5 
Step 1-Current iteration:  1 
Step 2-Current iteration:  1 
Step 1-Current iteration:  1 
Step 2-Current iteration:  2 
Iteration =  6 
Step 1-Current iteration:  1 
Step 2-Current iteration:  1 
Step 1-Current iteration:  1 
Step 2-Current iteration:  2 
Iteration =  7 
There were 50 or more warnings (use warnings() to see the first 50)
      V461    V2001 V1846     V5039     V1834 V3847     V3320
V461     0 1.294008     0 -2.293668 0.0000000     0 0.0000000
V2001    0 0.000000     0 -1.584952 0.0000000     0 0.0000000
V1846    0 0.000000     0  0.000000 0.5285677     0 0.0000000
V5039    0 0.000000     0  0.000000 0.0000000     0 0.0000000
V1834    0 0.000000     0  0.000000 0.0000000     0 0.2763067
V3847    0 0.000000     0  0.000000 0.0000000     0 0.0000000
V3320    0 0.000000     0  0.000000 0.0000000     0 0.0000000

SparseLearner documentation built on May 29, 2017, 9:18 p.m.