View source: R/gg_udependent.R
| gg_udependent | R Documentation |
A uvarpro fit records how strongly each variable depends on the
others. This function pulls those cross-variable dependency scores from
the fit with get.beta.entropy and
sdependent, and returns them as a tidy list that
plot.gg_udependent can draw as a network.
gg_udependent(
object,
threshold = 0.25,
q.signal = 0.75,
directed = TRUE,
min.degree = NULL,
...
)
object |
A fitted |
threshold |
Numeric; the positive dependency threshold passed on to
|
q.signal |
Quantile threshold (0–1) for picking out the signal
variables; passed on to |
directed |
Logical; |
min.degree |
Integer or |
... |
Additional arguments forwarded to |
A named list of class "gg_udependent" with elements:
$edgesData frame: variable_from, variable_to,
weight (raw cross-importance value).
$nodesData frame: variable (factor, levels by
descending degree), degree (integer; out-degree when
directed = TRUE, total degree when directed = FALSE),
selected (logical, TRUE if in sdependent's
signal set).
$graphigraph object. NULL if no dependencies
detected.
A "provenance" attribute carries threshold, q.signal,
directed, min.degree, xvar.names, and n.
UVarPro (Zhou, Lu and Ishwaran, 2026) extends the varpro framework to
the unsupervised setting: grow a forest without a response, then use
the same region-release contrasts varpro uses for supervised
importance to ask, "which variables explain the structure in the
data?" The lasso-driven variant frames each region-release contrast
as a classification task (does an observation belong to the region
or to its release?) and fits a lasso logistic regression with the
other variables as predictors. The coefficient on variable j
in the model for variable i's region-release contrast is the
entry I[i, j] of the matrix varPro::get.beta.entropy()
returns.
Read that entry as "how much does knowing j help separate
i's region from its release". A large I[i, j] says
j carries information about the structure varpro picked up in
i. varPro::sdependent thresholds that matrix at a
user-chosen cut and returns the set of "signal" variables: the
nodes with high enough out-degree to be worth keeping. We pass the
threshold through to sdependent and use the same matrix to
weight the edges of the resulting graph.
The graph is directed by default because I[i, j] and
I[j, i] are separate lasso coefficients and need not agree;
setting directed = FALSE collapses each pair by taking the
larger of the two, which is appropriate when you only want to see
that two variables are dependent, not which way the dependency
reads.
$edges has one row per surviving edge with the raw weight
I[i, j] (or, for undirected graphs, the max of the two
directions). $nodes has one row per surviving variable with
its degree (out-degree for directed, total degree for undirected)
and a selected flag for membership in the sdependent
signal set. $graph is the same information packaged as an
igraph object, with weight, degree, and
selected attached so plot.gg_udependent can render it
without recomputing anything.
screen a wide unsupervised dataset for the small set of
variables UVarPro thinks are carrying the signal: the nodes
with high degree, or those flagged selected = TRUE;
spot clusters of mutually dependent variables (hubs and the spokes around them) that may be measuring the same underlying construct;
compare two datasets, or two preprocessing pipelines, by looking at how their dependency graphs change.
An edge in this graph is a statistical dependency in the unsupervised
decomposition of the data. It is not a causal arrow. A high
I[i, j] says j predicts i's region membership,
not that j causes i.
Zhou, L., Lu, M. and Ishwaran, H. (2026). Variable priority for unsupervised variable selection. Pattern Recognition, 172:112727.
plot.gg_udependent
set.seed(42)
uv <- varPro::uvarpro(iris[, -5], ntree = 50)
gg <- gg_udependent(uv)
print(gg)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.