Description Usage Arguments Value Examples
View source: R/update_successor.R
After observing a relation between two nodes (i.e., transitioning between two states), update the successor values. This is a fairly low-level function, which should probably not be used directly in most cases. Instead, if you have a dataframe of observations, you can run that through the function 'learn_from_observations'. If you want to see how successor values vary for different parameter values (but using the same observations), you can do that using the function 'learn_from_param_sweep'.
Upon seeing a transition from i to j, the update equation for the successor matrix M is M(i) <- M(i) + α δ, where δ = onehot(i, j) + γ M(j) - M(i).
Technically, M should be indexed like a matrix. But for simplicity, I write it like a single-input function that returns the associated row. Therefore, the successor algorithm updates values in a row-wise manner.
The one-hot term is a vector the length of M(i), which is filled with zeros except for a single one (1) at the location j. Hence, the one-hot vector encodes that when the agent was in state i, the next observed state was j.
Where does the "successor" part of "successor representation/features" come from? That's a reference to the middle part of the update equation. When you encode the relationship between i, j, that's entirely accounted for by the one-hot vector. But, you may want to also encode longer-range relations, such that your representation of i not only includes the relationship with j, but also the relationship between j, k. Therefore, you will end up with larger successor values for direct connections, and smaller values for indirect (e.g., long-range) connections.
The learning rate α tempers how strongly the one-hot updates the learned successor values. The lookahead horizon γ dictates how strongly the successor state's relations are incorporated into the update.
1 2 3 4 5 6 7 8 | update_successor(
input_matrix,
alpha,
gamma,
previous_state,
current_state,
bidirectional = FALSE
)
|
input_matrix |
A square NxN matrix created by 'initialize_successor'. |
alpha |
Scalar corresponding to the learning rate bound in [0, 1]. |
gamma |
Scalar corresponding to the lookaround horizon bound in [0, 1). |
previous_state |
Scalar corresponding to the previously-seen state. |
current_state |
Scalar corresponding to the currently-seen state. |
bidirectional |
Logical. Defaults to 'FALSE', which means that only the 'from-to' relationship gets updated. If set to 'TRUE', this function will also update the 'to-from' relationship. |
A matrix with updated successor values, given the observation.
1 2 3 4 5 6 | `%>%` <- magrittr::`%>%`
successr::karate %>%
tidygraph::as_tbl_graph(directed = F) %>%
initialize_successor() %>%
update_successor(0.1, 0.4, 1, 2, TRUE) %>%
matrix_to_adjlist()
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.