Description Usage Arguments Details Value Examples
View source: R/rayleigh_selection.R
Given a nerve or a clique complex and a set of features consisting of functions with support on the set of points underlying the complex, it asseses the significance of each feature in the simplicial complex by computing its scalar and vectorial Combinatorial Laplacian Score and comparing it with the null distribution that results from reshufling many times the values of the function across the point cloud. For nerve complexes, feature functions induce 0- and 1-forms in the complex by averaging the function across the points associated to 0- and 1-simplices respectively. For clique complexes, feature functions are directly 0-forms in the complex and 1-forms are obtained by averaging the function across the two vertices connected by each edge.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | rayleigh_selection(
g2,
f,
num_perms = 1000,
seed = 10,
num_cores = 1,
mc.preschedule = TRUE,
one_forms = FALSE,
weights = FALSE,
covariates = NULL,
combination.method = "KM",
optimize.p = NULL,
min_perms = 100,
pow = 1,
nextremes = c(seq(50, 250, 25), seq(300, 500, 50), seq(600, 1000, 100)),
alpha = 0.15
)
|
g2 |
an object of the class |
f |
a numeric vector or matrix specifying one or more functions with support on the set of points whose significance will be assessed in the simplicial complex. Each column corresponds to a point and each row specifies a different function. |
num_perms |
number of permutations used to build the null distribution for each
feature. When |
seed |
integer specifying the seed used to initialize the generator of permutations. By default is set to 10. |
num_cores |
integer specifying the number of cores to be used in the computation. By default only one core is used. |
mc.preschedule |
when set to TRUE parallel compulations are prescheduled, see
mclapply. Only has effect if |
one_forms |
when set TRUE the Combinatorial Laplacian Score for 1-forms is also computed. By default is set to FALSE. |
weights |
when set to TRUE it takes 2-simplices into account when computing weights. By default is set to FALSE. |
covariates |
numeric vector or matrix specifying covariate functions to be samples in
tandem with the functions in f. Each column correspond to a point and each row specifies a
different covariate function. Is ignored when set to |
combination.method |
method used to combine p-values, can be "KM" for the Kost-McDermott method or "EBM" for the empirical Brown's method (Gibbs et. al. 16). Default value is "KM". Only has an effect if g2 is a list. |
optimize.p |
string indicating the type of optimization used for computing p-values.
Must have value |
min_perms |
minimum number of permutations to be used when computing p-values, only
relevant when |
pow |
positive number indicating the power to which the samples of the null distribution and the associated
score are to be transformed before computing a GPD approximation (only used when
|
nextremes |
vector of integers with the candidate number of extremes for fitting a GDP.
Only used when |
alpha |
level of FDR control for choosing the number of extremes. Only used when
|
The calculation of p-values can be optimized by iteratively doubling the number of samples of the null distribution until convergence is reached. Two version of this iteraction scheme are implemented. In the fist one, a p-value is considered convergent if there are at least 10 samples of the null distribution that do not exceed the associated Combinatorial Laplacian Score. In the second one, a p-value is considered convergent of the condition above holds, and, in case there are less than 10 small samples a generalized Pareto distribution (GPD) is used to approximate a p-value. A p-value obtained by a GPD is considered convergent if the relative variation is small in the last 3 iteractions and the quartiles of the approximation are relatively close.
When computing a p-value using a GPD, only null distribution samples in the first quartile are considered. The Combinatorial Laplacian Score and associated null distribution samples are transformed by the function
f(x) = (1 - (x - loc)/scale)^pow
where loc is the first quartile of the null distribution and scale is the first quartile minus the 5 quartiles for the p-value estimates are obtained by sampling GDP parameters form a multivariate normal distribution.
When g2 is a simplicial complex, returns a data frame with the value of the Combinatorial Laplacian
Score for 0- and 1-forms, the p-values, and the q-values computed using Benjamini-Hochberg procedure.
If optimize.p
is set to "perm"
or "gpd"
then then number of samples at which convergence
of p-values was obtained is also returned. When g2 is a list, returns a list with Combinatorial Laplacian
Scores, individual p-values, combined p-values and q-values.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | # Example 1
library(RayleighSelection)
gy <- nerve_complex(list(c(1,4,6,10), c(1,2,7), c(2,3,8), c(3,4,9,10), c(4,5)))
rayleigh_selection(gy,t(as.data.frame(c(0,1,1,0,0,0,0,0,0,1))), one_forms = TRUE)
# Example 2: MNIST dataset
data("mnist")
# Compute reduced representation using Laplacian eigenmap of pixels with high variance
library(dimRed)
leim <- LaplacianEigenmaps()
mnist_top <- mnist[apply(mnist, 1, var) > 10000,]
emb <- leim@fun(as(t(mnist_top), "dimRedData"), leim@stdpars)
# Compute Mapper representation using the Laplacian eigenmap as an auxiliary function and correlation
# distance as metric
library(TDAmapper)
mnist_distances <- (1.0 - cor(mnist_top))
m2 <- mapper2D(distance_matrix = mnist_distances,
filter_values = list(emb@data@data[,1], emb@data@data[,2]),
num_intervals = c(30,30),
percent_overlap = 35,
num_bins_when_clustering = 10);
# Compute the nerve complex
gg <- nerve_complex(m2$points_in_vertex)
# Compute R score, p-value, and q-value for the pixels 301st to 305th
rayleigh_selection(gg, mnist[301:305,], one_forms = TRUE)
# Compute another mapper representation with different percent_overlap
m2.2 <- mapper2D(distance_matrix = mnist_distances,
filter_values = list(emb@data@data[,1], emb@data@data[,2]),
num_intervals = c(30,30),
percent_overlap = 50,
num_bins_when_clustering = 10);
# Compute the nerve complex and combine in list
gg.list <- list(m2, nerve_complex(m2.2$points_in_vertex))
# Compute R scores, p-values and q-values
rayleigh_selection(gg.list, mnist[301:305,], one_forms = TRUE)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.