learn_from_param_sweep: Learn successor values from observations (many parameters)
In psychNerdJae/successr: Work with Successor Algorithms

Description Usage Arguments Value Examples

After observing relations between nodes (i.e., transitioning between states), update the successor values. This function learns from many observations, and updates according to many parameter values. To update the successor values from a single observation, use 'update_successor' instead. To update successor values from a single set of parameters, use 'learn_from_observations' instead.

Upon seeing a transition from i to j, the update equation for the successor matrix M is M(i) <- M(i) + α δ, where δ = onehot(i, j) + γ M(j) - M(i).

Technically, M should be indexed like a matrix. But for simplicity, I write it like a single-input function that returns the associated row. Therefore, the successor algorithm updates values in a row-wise manner.

The one-hot term is a vector the length of M(i), which is filled with zeros except for a single one (1) at the location j. Hence, the one-hot vector encodes that when the agent was in state i, the next observed state was j.

Where does the "successor" part of "successor representation/features" come from? That's a reference to the middle part of the update equation. When you encode the relationship between i, j, that's entirely accounted for by the one-hot vector. But, you may want to also encode longer-range relations, such that your representation of i not only includes the relationship with j, but also the relationship between j, k. Therefore, you will end up with larger successor values for direct connections, and smaller values for indirect (e.g., long-range) connections.

The learning rate α tempers how strongly the one-hot updates the learned successor values. The lookahead horizon γ dictates how strongly the successor state's relations are incorporated into the update.

learn_from_param_sweep(
  successor_matrix,
  observations,
  alphas,
  gammas,
  bidirectional = FALSE,
  edge_col_name = "successor_value"
)

`successor_matrix`	A square matrix created by 'initialize_successor'.
`observations`	A tibble with observations 'from' a node 'to' another.
`alphas`	Vector. Learning rate bound in [0, 1].
`gammas`	Vector. Lookaround horizon bound in [0, 1).
`bidirectional`	Logical. Defaults to 'FALSE', which means that only the 'from-to' relationship gets updated. If set to 'TRUE', this function will also update the 'to-from' relationship.
`edge_col_name`	The name of the column encoding the relation between two given nodes. By default, this is set to 'successor_value'.

A tibble with NxN observations for each combination of parameters, such that every row is a pairwise combination of two nodes ('from' and 'to'). Includes a column encoding the successor value between each pair of nodes, and columns indicating what alpha/gamma values were used to compute successor values.

`%>%` <- magrittr::`%>%`
karate_graph <- successr::karate %>%
    tidygraph::as_tbl_graph(directed = F)
karate_walk <- karate_graph %>%
    generate_random_walk(1000)
karate_graph %>%
    initialize_successor() %>%
    learn_from_param_sweep(karate_walk, c(0.1, 0.2), c(0, 0.4), TRUE)