graphclass: Regularized logistic regression classifier for networks.

View source: R/graphclass.R

graphclassR Documentation

Regularized logistic regression classifier for networks.

Description

graphclass fits a regularized logistic regression to a set of network adjacency matrices with responses, and returns an object with the classifier.

Plots the coefficients matrix obtained with the function graphclass.

Usage

graphclass(
  X = NULL,
  Y = NULL,
  type = c("intersection", "union", "groups", "fusion"),
  ...
)

## Default S3 method:
graphclass(
  X = NULL,
  Y = NULL,
  Xtest = NULL,
  Ytest = NULL,
  Adj_list = NULL,
  type = c("intersection", "union", "groups", "fusion"),
  lambda = 0,
  rho = 0,
  gamma = 1e-05,
  params = NULL,
  id = "",
  verbose = F,
  D = NULL,
  Groups = NULL,
  G_penalty_factors = NULL,
  ...
)

## S3 method for class 'graphclass'
plot(object, ...)

Arguments

X

A matrix with the training samples, in wich each row represents the vectorized (by column order) upper triangular part of a network adjacency matrix.

Y

A vector containing the class labels of the training samples (only 2 classes are supported for now).

type

Type of penalty function. Default is "intersection". See details.

Xtest

Optional argument for providing a matrix containing the test samples, with each row representing an upper-triangular vectorized adjacency matrix.

Ytest

Optional argument containing the labels of test samples.

Adj_list

A training list of of symmetric adjacency matrices with zeros in the diagonal

lambda

penalty parameter lambda, by default is set to 0.

rho

penalty parameter rho controlling sparsity, by default is set to 0.

gamma

ridge parameter (for numerical purposes). Default is gamma = 1e-5.

params

A list containing internal parameters for the optimization algorithm. See details.

verbose

whether output is printed

D

matrix D used by the penalty to define the groups. This optional argument can be used to pass a precomputed matrix D, which can be time saving if the method is fitted multiple times. See the function construct_D.

Groups

list of lists, where each list correspond to a grouping and each sublist to sets of indexes in X. Each sublist should be a non-overlapping group.

G_penalty_factors

For type "groups", each group is penalized by this factor. Should sum to 1.

object

trained graphclass object

Details

The function graphclass fits a regularized logistic regression to classify a set of network adjacency matrices with N labeled nodes and corresponding responses. The classifier fits a matrix of coefficients B\in{R}^{N\times N}, in which B_{ij} indicates the coefficient corresponding to the edge (i,j).

The argument type provides options to choose the penalty function. If type = "intersection" or "union", the penalty corresponds to the node selection penalty defined as

Ω(B) = λ ≤ft(∑_{i=1}^N√{∑_{j=1}^N B_{ij}^2} + ρ ∑_{i=1}^N∑_{j=1}^N|B_{ij}|\right).

When type = "intersection", a symmetric restriction on B is enforced, and the penalty promotes subgraph selection. If type = "union", the penalty promotes individual node selection. See \insertCiterelion2017network;textualgraphclass for more details.

The value type = "groups" corresponds to a generic group lasso penalty. The groups of edges have to be specified using the argument Groups with a list of arrays, in which each element of the list corresponds to a group, and the array indicates the indexes of the variables in that group. The optional argument G_penalty_factors is an array of size equal to the number of groups, and can be used to specify different weights for each group on the penalty (for example, when groups have different sizes).

The optional argument params is a list that allows to control some internal parameters of the optimization algorithm. The elements beta_start and b_start are initial values for the optimization algorithm. The value of beta_start is a vector that indicates the weights of the upper triangular part of B, and b_start is the initial value of the threshold in the logistic regression. By default, these parameters are set to zero. The elements MAX_ITER and CONV_CRIT can be used to change the maximum number of iterations and the convergence criterion in the proximal algorithm for fitting the node selection penalty (see \insertCiterelion2017network;textualgraphclass). By default, these values are set to MAX_ITER=300 and CONV_CRIT = 1e-5.

Value

An object containing the trained graph classifier.

beta

Edge coefficients vector of the regularized logistic regression solution.

b

Intercept value.

Yfit

Fitted logistic regression probabilities in the train data.

Ypred

Predicted class for the test samples (if available).

train_error

Percentage of train samples that are misclassified.

test_error

Percentage of test samples that are misclassified (if available).

References

\insertRef

relion2017networkgraphclass

See Also

plot.graphclass, predict.graphclass

Examples


# Load COBRE data
data(COBRE.data)
X <- COBRE.data$X.cobre
Y <- COBRE.data$Y.cobre

# An example of the subgraph selection penalty
gc = graphclass(X = X, Y = factor(Y), type = "intersection",
               lambda = 1e-4, rho = 1, gamma = 1e-5)
plot(gc)


# 5-fold cross validation
fold_index <- (1:length(Y) %% 5) + 1

# Make penalty matrix in advance to save time
D263 <- construct_D(nodes = 263)

gclist <- list()
for(fold in 1:5) {
    foldout <- which(fold_index == fold) 
    gclist[[fold]] <- graphclass(X = X[-foldout,], Y = factor(Y[-foldout]),
                     Xtest = X[foldout,], Ytest = factor(Y[foldout]),
                     type = "intersection",
                     lambda = 1e-4, rho = 1, gamma = 1e-5,
                     D = D263)
}
# test error on each fold
lapply(gclist, function(gc) gc$test_error)

data(COBRE.data)
X <- COBRE.data$X.cobre
Y <- COBRE.data$Y.cobre

# An example of the subgraph selection penalty
gc = graphclass(X, Y = factor(Y), lambda = 1e-5, rho = 1)

plot(gc)

jesusdaniel/graphclass documentation built on Aug. 10, 2022, 3:10 p.m.