labelProp: Compute relevance scores for a collection of nodes based on a...

View source: R/labelProp.R

labelPropR Documentation

Compute relevance scores for a collection of nodes based on a set of seed nodes.

Description

See: https://proceedings.neurips.cc/paper/2003/file/87682805257e619d49b8e0dfdc14affa-Paper.pdf

Usage

labelProp(
  x,
  seeds,
  method = "rw",
  beta = 0.5,
  bootstrap = FALSE,
  num_bootstraps = 100,
  prop_seeds = 0.5,
  permute = FALSE,
  num_permutations = 100,
  softmax = FALSE,
  verbose = TRUE
)

Arguments

x

(numeric) if method = rw, then x is a symmetric matrix of transition probabilities. if method = nns, then x is a matrix of vector representations.

seeds

(list) a list of character vectors defining the classes of interest. If named, then names(seeds) will be used to define classes, otherwise classes will be labeled class1, class2 etc.

method

(character) either nns or rw. If nns, then values will be computed using cosine similarity, if not values are computed using spreading activation.

beta

(numeric) in (0,1), specifies the extent to which the algorithm favors local (similar labels for neighbors) vs. global (correct labels on seed words) consistency. Lower (higher) values emphasize local (global) consistency.

bootstrap

(logical) if TRUE, use bootstrapping – sample a proportion (defined by prop_seeds) of seeds and re-run algorithm. Required to get std. errors.

num_bootstraps

(integer) number of bootstraps to use.

prop_seeds

(numeric) proportion of seeds to sample when bootstrapping.

permute

(logical) if TRUE, compute empirical p-values using permutation test

num_permutations

(numeric) number of permutations to use.

softmax

(logical) if TRUE, the exponential of a node's score for a given class is normalized by the sum of the exponential of scores across all classes. Option is only available when two or more classes are specified.

verbose

(logical) if TRUE show progress bar.

Value

a data.frame or list of data.frames (one for each class) with the following columns:

node

(character) rownames of x.

class

(character) name of class. If none provided, then classes will be labeled class1, class2 etc.

score

(numeric) score assigned to node.

std.error

(numeric) std. error of score. Column is dropped if bootstrap = FALSE.

Examples



# to use the random-walkd algorithm we first build a transition matrix
transition_matrix <- build_transition_matrix(x = anes2016_glove, threads = 6L)

# define seeds (labeled nodes),
# if list is unlabeled, "class1", "class2" etc. will be used as labels
seeds = list("immigration" = c("immigration", "immigrants", "immigrant"),
"economy" = c("jobs", "unemployment", "wages"))

# propagate label using rw
rw_labels <- labelProp(x = transition_matrix, seeds = seeds,
method = "rw", beta = 0.5)

# propagate label using nns,
# notice the main input, x, are the vector representations
nns_labels <- labelProp(x = anes2016_glove, seeds = seeds, method = "nns")

# check output for economy
rw_labels[["economy"]]
nns_labels[["economy"]]


prodriguezsosa/labelProp documentation built on May 14, 2023, 11:19 a.m.