learn_from_observations: Learn successor values from observations

Description Usage Arguments Value Examples

View source: R/learn_from_observations.R

Description

After observing relations between nodes (i.e., transitioning between states), update the successor values. This function learns from many observations. To update the successor values from a single observation, use 'update_successor' instead.

Upon seeing a transition from i to j, the update equation for the successor matrix M is M(i) <- M(i) + α δ, where δ = onehot(i, j) + γ M(j) - M(i).

Technically, M should be indexed like a matrix. But for simplicity, I write it like a single-input function that returns the associated row. Therefore, the successor algorithm updates values in a row-wise manner.

The one-hot term is a vector the length of M(i), which is filled with zeros except for a single one (1) at the location j. Hence, the one-hot vector encodes that when the agent was in state i, the next observed state was j.

Where does the "successor" part of "successor representation/features" come from? That's a reference to the middle part of the update equation. When you encode the relationship between i, j, that's entirely accounted for by the one-hot vector. But, you may want to also encode longer-range relations, such that your representation of i not only includes the relationship with j, but also the relationship between j, k. Therefore, you will end up with larger successor values for direct connections, and smaller values for indirect (e.g., long-range) connections.

The learning rate α tempers how strongly the one-hot updates the learned successor values. The lookahead horizon γ dictates how strongly the successor state's relations are incorporated into the update.

Usage

1
2
3
4
5
6
7
8
learn_from_observations(
  successor_matrix,
  observations,
  input_alpha,
  input_gamma,
  bidirectional = FALSE,
  edge_col_name = "successor_value"
)

Arguments

successor_matrix

A square matrix created by 'initialize_successor'.

observations

A tibble with observations 'from' a node 'to' another.

input_alpha

Scalar. Learning rate bound in [0, 1].

input_gamma

Scalar. Lookaround horizon bound in [0, 1).

bidirectional

Logical. Defaults to 'FALSE', which means that only the 'from-to' relationship gets updated. If set to 'TRUE', this function will also update the 'to-from' relationship.

edge_col_name

The name of the column encoding the relation between two given nodes. By default, this is set to 'successor_value'.

Value

A tibble with NxN observations, such that every row is a pairwise combination of two nodes ('from' and 'to'). Includes a column encoding the successor value between each pair of nodes.

Examples

1
2
3
4
5
6
7
8
`%>%` <- magrittr::`%>%`
karate_graph <- successr::karate %>%
    tidygraph::as_tbl_graph(directed = F)
karate_walk <- karate_graph %>%
    generate_random_walk(1000)
karate_graph %>%
    initialize_successor() %>%
    learn_from_observations(karate_walk, 0.1, 0.4, TRUE)

psychNerdJae/successr documentation built on Dec. 22, 2021, 9:56 a.m.