update_successor: Update successor values

Description Usage Arguments Value Examples

View source: R/update_successor.R

Description

After observing a relation between two nodes (i.e., transitioning between two states), update the successor values. This is a fairly low-level function, which should probably not be used directly in most cases. Instead, if you have a dataframe of observations, you can run that through the function 'learn_from_observations'. If you want to see how successor values vary for different parameter values (but using the same observations), you can do that using the function 'learn_from_param_sweep'.

Upon seeing a transition from i to j, the update equation for the successor matrix M is M(i) <- M(i) + α δ, where δ = onehot(i, j) + γ M(j) - M(i).

Technically, M should be indexed like a matrix. But for simplicity, I write it like a single-input function that returns the associated row. Therefore, the successor algorithm updates values in a row-wise manner.

The one-hot term is a vector the length of M(i), which is filled with zeros except for a single one (1) at the location j. Hence, the one-hot vector encodes that when the agent was in state i, the next observed state was j.

Where does the "successor" part of "successor representation/features" come from? That's a reference to the middle part of the update equation. When you encode the relationship between i, j, that's entirely accounted for by the one-hot vector. But, you may want to also encode longer-range relations, such that your representation of i not only includes the relationship with j, but also the relationship between j, k. Therefore, you will end up with larger successor values for direct connections, and smaller values for indirect (e.g., long-range) connections.

The learning rate α tempers how strongly the one-hot updates the learned successor values. The lookahead horizon γ dictates how strongly the successor state's relations are incorporated into the update.

Usage

1
2
3
4
5
6
7
8
update_successor(
  input_matrix,
  alpha,
  gamma,
  previous_state,
  current_state,
  bidirectional = FALSE
)

Arguments

input_matrix

A square NxN matrix created by 'initialize_successor'.

alpha

Scalar corresponding to the learning rate bound in [0, 1].

gamma

Scalar corresponding to the lookaround horizon bound in [0, 1).

previous_state

Scalar corresponding to the previously-seen state.

current_state

Scalar corresponding to the currently-seen state.

bidirectional

Logical. Defaults to 'FALSE', which means that only the 'from-to' relationship gets updated. If set to 'TRUE', this function will also update the 'to-from' relationship.

Value

A matrix with updated successor values, given the observation.

Examples

1
2
3
4
5
6
`%>%` <- magrittr::`%>%`
successr::karate %>%
    tidygraph::as_tbl_graph(directed = F) %>%
    initialize_successor() %>%
    update_successor(0.1, 0.4, 1, 2, TRUE) %>%
    matrix_to_adjlist()

psychNerdJae/successr documentation built on Dec. 22, 2021, 9:56 a.m.