gs.dim.select: Dimensionality selection for singular values using profile...

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

Select the number of significant singular values, by finding the ‘elbow’ of the scree plot, in a principled way.

Usage

1
2
gs.dim.select(X, k = NULL, edge.attr = NULL, n = 3, threshold = FALSE,
  plot = FALSE)

Arguments

X

an object of class igraph, a numeric/complex matrix or 2-D array with n rows and d columns, or a one-dimensional vector of class "numeric" containing ordered singular values. Non-numeric inputs are embedded using irlba.

k

The embedding dimensionality. Defaults to NULL.

  • If X is of class igraph, Should have k < length(V(X)). If k==NULL, defaults to gorder(X)-1.

  • If X is matrix or 2-D array, should be the case that k < min(dim(X)). If k==NULL, defaults to min(dim(X))

edge.attr

the names of the attribute to use for weights if X is an object of class igraph. Should be in names(get.edge.attribute(graph)). Defaults to NULL, which assumes the graph is binary.

n

default value: 3; the number of returned elbows.

threshold

either FALSE or an object of class numeric. If threshold is of class numeric, then all the elements that are not larger than the threshold will be ignored.

plot

logical. When TRUE, the return object includes a plot depicting the elbows.

Details

The input of the function is a numeric vector which contains the measure of ‘importance’ for each dimension.

For spectral embedding, these are the singular values of the adjacency matrix. The singular values are assumed to be generated from a Gaussian mixture distribution with two components that have different means and same variance. The dimensionality d is chosen to maximize the likelihood when the d largest singular values are assigned to one component of the mixture and the rest of the singular values assigned to the other component.

This function can also be used for the general separation problem, where we assume that the left and the right of the vector are coming from two Normal distributions, with different means, and we want to know their border. See examples below.

Value

list containing the following:

value

The singular values associated with each elbow in elbow.

elbow

The indices of the elbows.

plot

If plot is TRUE, contains a scree plot annotated with the elbows.

Author(s)

Youngser Park youngser@jhu.edu, Gabor Csardi csardi.gabor@gmail.com, and Eric Bridgeford ericwb95@gmail.com.

References

M. Zhu, and A. Ghodsi (2006). Automatic dimensionality selection from the scree plot via the use of profile likelihood. Computational Statistics and Data Analysis, Vol. 51, 918–930.

See Also

gs.embed.ase

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
# Generate the two groups of singular values with
# Gaussian mixture of two components that have different means
sing.vals  <- c( rnorm (10, mean=1, sd=1), rnorm(10, mean=3, sd=1) )
dim.chosen <- gs.dim.select(sing.vals)
dim.chosen

# Sample random vectors with multivariate normal distribution
# and normalize to unit length
lpvs <- matrix(rnorm(200), 10, 20)
lpvs <- apply(lpvs, 2, function(x) { (abs(x) / sqrt(sum(x^2))) })
RDP.graph  <- sample_dot_product(lpvs)
gs.dim.select( embed_adjacency_matrix(RDP.graph, 10)$D )

# Sample random vectors with the Dirichlet distribution
lpvs.dir    <- sample_dirichlet(n=20, rep(1, 10))
RDP.graph.2 <- sample_dot_product(lpvs.dir)
gs.dim.select( embed_adjacency_matrix(RDP.graph.2, 10)$D )

# Sample random vectors from hypersphere with radius 1.
lpvs.sph    <- sample_sphere_surface(dim=10, n=20, radius=1)
RDP.graph.3 <- sample_dot_product(lpvs.sph)
gs.dim.select( embed_adjacency_matrix(RDP.graph.3, 10)$D )

neurodata/graphstats documentation built on May 14, 2019, 5:19 p.m.