learn_from_param_sweep: Learn successor values from observations (many parameters)

Description Usage Arguments Value Examples

View source: R/learn_from_param_sweep.R

Description

After observing relations between nodes (i.e., transitioning between states), update the successor values. This function learns from many observations, and updates according to many parameter values. To update the successor values from a single observation, use 'update_successor' instead. To update successor values from a single set of parameters, use 'learn_from_observations' instead.

Upon seeing a transition from i to j, the update equation for the successor matrix M is M(i) <- M(i) + α δ, where δ = onehot(i, j) + γ M(j) - M(i).

Technically, M should be indexed like a matrix. But for simplicity, I write it like a single-input function that returns the associated row. Therefore, the successor algorithm updates values in a row-wise manner.

The one-hot term is a vector the length of M(i), which is filled with zeros except for a single one (1) at the location j. Hence, the one-hot vector encodes that when the agent was in state i, the next observed state was j.

Where does the "successor" part of "successor representation/features" come from? That's a reference to the middle part of the update equation. When you encode the relationship between i, j, that's entirely accounted for by the one-hot vector. But, you may want to also encode longer-range relations, such that your representation of i not only includes the relationship with j, but also the relationship between j, k. Therefore, you will end up with larger successor values for direct connections, and smaller values for indirect (e.g., long-range) connections.

The learning rate α tempers how strongly the one-hot updates the learned successor values. The lookahead horizon γ dictates how strongly the successor state's relations are incorporated into the update.

Usage

1
2
3
4
5
6
7
8
learn_from_param_sweep(
  successor_matrix,
  observations,
  alphas,
  gammas,
  bidirectional = FALSE,
  edge_col_name = "successor_value"
)

Arguments

successor_matrix

A square matrix created by 'initialize_successor'.

observations

A tibble with observations 'from' a node 'to' another.

alphas

Vector. Learning rate bound in [0, 1].

gammas

Vector. Lookaround horizon bound in [0, 1).

bidirectional

Logical. Defaults to 'FALSE', which means that only the 'from-to' relationship gets updated. If set to 'TRUE', this function will also update the 'to-from' relationship.

edge_col_name

The name of the column encoding the relation between two given nodes. By default, this is set to 'successor_value'.

Value

A tibble with NxN observations for each combination of parameters, such that every row is a pairwise combination of two nodes ('from' and 'to'). Includes a column encoding the successor value between each pair of nodes, and columns indicating what alpha/gamma values were used to compute successor values.

Examples

1
2
3
4
5
6
7
8
`%>%` <- magrittr::`%>%`
karate_graph <- successr::karate %>%
    tidygraph::as_tbl_graph(directed = F)
karate_walk <- karate_graph %>%
    generate_random_walk(1000)
karate_graph %>%
    initialize_successor() %>%
    learn_from_param_sweep(karate_walk, c(0.1, 0.2), c(0, 0.4), TRUE)

psychNerdJae/successr documentation built on Dec. 22, 2021, 9:56 a.m.