graphclass: Regularized logistic regression classifier for networks.
In jesusdaniel/graphclass: Network classification

graphclass

R Documentation

Regularized logistic regression classifier for networks.

Description

graphclass fits a regularized logistic regression to a set of network adjacency matrices with responses, and returns an object with the classifier.

Plots the coefficients matrix obtained with the function graphclass.

Usage

graphclass(
  X = NULL,
  Y = NULL,
  type = c("intersection", "union", "groups", "fusion"),
  ...
)

## Default S3 method:
graphclass(
  X = NULL,
  Y = NULL,
  Xtest = NULL,
  Ytest = NULL,
  Adj_list = NULL,
  type = c("intersection", "union", "groups", "fusion"),
  lambda = 0,
  rho = 0,
  gamma = 1e-05,
  params = NULL,
  id = "",
  verbose = F,
  D = NULL,
  Groups = NULL,
  G_penalty_factors = NULL,
  ...
)

## S3 method for class 'graphclass'
plot(object, ...)

Arguments

`X`	A matrix with the training samples, in wich each row represents the vectorized (by column order) upper triangular part of a network adjacency matrix.
`Y`	A vector containing the class labels of the training samples (only 2 classes are supported for now).
`type`	Type of penalty function. Default is `"intersection".` See details.
`Xtest`	Optional argument for providing a matrix containing the test samples, with each row representing an upper-triangular vectorized adjacency matrix.
`Ytest`	Optional argument containing the labels of test samples.
`Adj_list`	A training list of of symmetric adjacency matrices with zeros in the diagonal
`lambda`	penalty parameter lambda, by default is set to 0.
`rho`	penalty parameter rho controlling sparsity, by default is set to 0.
`gamma`	ridge parameter (for numerical purposes). Default is `gamma = 1e-5`.
`params`	A list containing internal parameters for the optimization algorithm. See details.
`verbose`	whether output is printed
`D`	matrix D used by the penalty to define the groups. This optional argument can be used to pass a precomputed matrix `D`, which can be time saving if the method is fitted multiple times. See the function `construct_D`.
`Groups`	list of lists, where each list correspond to a grouping and each sublist to sets of indexes in X. Each sublist should be a non-overlapping group.
`G_penalty_factors`	For type "groups", each group is penalized by this factor. Should sum to 1.
`object`	trained graphclass object

Details

The function graphclass fits a regularized logistic regression to classify a set of network adjacency matrices with N labeled nodes and corresponding responses. The classifier fits a matrix of coefficients B\in{R}^{N\times N}, in which B_{ij} indicates the coefficient corresponding to the edge (i,j).

The argument type provides options to choose the penalty function. If type = "intersection" or "union", the penalty corresponds to the node selection penalty defined as

Ω(B) = λ ≤ft(∑_{i=1}^N√{∑_{j=1}^N B_{ij}^2} + ρ ∑_{i=1}^N∑_{j=1}^N|B_{ij}|\right).

When type = "intersection", a symmetric restriction on B is enforced, and the penalty promotes subgraph selection. If type = "union", the penalty promotes individual node selection. See \insertCiterelion2017network;textualgraphclass for more details.

The value type = "groups" corresponds to a generic group lasso penalty. The groups of edges have to be specified using the argument Groups with a list of arrays, in which each element of the list corresponds to a group, and the array indicates the indexes of the variables in that group. The optional argument G_penalty_factors is an array of size equal to the number of groups, and can be used to specify different weights for each group on the penalty (for example, when groups have different sizes).

The optional argument params is a list that allows to control some internal parameters of the optimization algorithm. The elements beta_start and b_start are initial values for the optimization algorithm. The value of beta_start is a vector that indicates the weights of the upper triangular part of B, and b_start is the initial value of the threshold in the logistic regression. By default, these parameters are set to zero. The elements MAX_ITER and CONV_CRIT can be used to change the maximum number of iterations and the convergence criterion in the proximal algorithm for fitting the node selection penalty (see \insertCiterelion2017network;textualgraphclass). By default, these values are set to MAX_ITER=300 and CONV_CRIT = 1e-5.

Value

An object containing the trained graph classifier.

`beta`	Edge coefficients vector of the regularized logistic regression solution.
`b`	Intercept value.
`Yfit`	Fitted logistic regression probabilities in the train data.
`Ypred`	Predicted class for the test samples (if available).
`train_error`	Percentage of train samples that are misclassified.
`test_error`	Percentage of test samples that are misclassified (if available).

References

\insertRef

relion2017networkgraphclass

Examples


# Load COBRE data
data(COBRE.data)
X <- COBRE.data$X.cobre
Y <- COBRE.data$Y.cobre

# An example of the subgraph selection penalty
gc = graphclass(X = X, Y = factor(Y), type = "intersection",
               lambda = 1e-4, rho = 1, gamma = 1e-5)
plot(gc)


# 5-fold cross validation
fold_index <- (1:length(Y) %% 5) + 1

# Make penalty matrix in advance to save time
D263 <- construct_D(nodes = 263)

gclist <- list()
for(fold in 1:5) {
    foldout <- which(fold_index == fold) 
    gclist[[fold]] <- graphclass(X = X[-foldout,], Y = factor(Y[-foldout]),
                     Xtest = X[foldout,], Ytest = factor(Y[foldout]),
                     type = "intersection",
                     lambda = 1e-4, rho = 1, gamma = 1e-5,
                     D = D263)
}
# test error on each fold
lapply(gclist, function(gc) gc$test_error)

data(COBRE.data)
X <- COBRE.data$X.cobre
Y <- COBRE.data$Y.cobre

# An example of the subgraph selection penalty
gc = graphclass(X, Y = factor(Y), lambda = 1e-5, rho = 1)

plot(gc)

jesusdaniel/graphclass documentation built on Aug. 10, 2022, 3:10 p.m.