knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7.1, fig.height = 6, fig.retina = 2 )
load_all() # library(pbsEDM) library(dplyr) library(tidyr) library(magrittr) library(ggplot2)
Jump to Where we are at section. Some earlier code probably won't work now as I've been continuously updating functions. This is not a vignette, more figuring out as go along. So unlikely to get it fully rebuilt.
Adapting Luke's original mve.Rmd
to dig into his multiview embedding functions. Was going
use the same default time series as in analyse_simple_time_series.Rmd
but
that's univariate, so stick with his larkin one.
Andy has updated several functions and appended the names with for_sve()
.
Hao and Sugihara (2016) define, for, say Hastings and Powell x-y-z food-chain model:
multiview embedding: do every $E$-sized combination of x, y, z with lags (excluding any that have no lags of 0), and take the $\sqrt{m}$ views with highest $\rho$ , where $m$ is number of $E$-dimensional combinations, and average the forecast from each one, to give the forecast. Important: unlike Simplex, which uses $E + 1$ nearest neighbours, this takes the single neares neighbour from each view and averages over those. For Simplex the weighting of neighbours depends on their distance, which can be quite sensitive to noise (Hao and Sugihara).
Luke's single_view_embedding()
does one embedding, which can be multivariate. So
basically can do one of the 3d plots in Hao and Sugihara (2016). Then gets
called repeatedly by multiview_embedding()
to do all of them.
Not sure how $E$ gets prescribed, though could loop over it. In multiview_embedding()
looks
like it will just do all potential combinations (andy $E$) based on the lags
argument.
larkin
plot(larkin$time, larkin$recruits, type = "o", ylim = c(0, 0.2))
This is different format to what we have in salmon_sim()
, so switching to
ours.
simulated <- EDMsimulate::salmon_sim() simulated
Here's what we have in EDMsimulate::sim_and_fit
and ..._realisations()
, so adapt those
(not evaluating here), though we are going to use R_t
primarily, not
the default R_prime_t
, just want to be able to switch:
# sim_and_fit(): sim_and_fit <- function(salmon_sim_args = list(), pbsEDM_args = list( lags = list(R_switch = 0, S_t = 0:3)), R_switch = "R_prime_t"){ stopifnot(R_switch %in% c("R_t", "R_prime_t")) stopifnot(names(pbsEDM_args$lags)[1] == "R_switch") names(pbsEDM_args$lags)[1] <- R_switch simulated <- do.call(salmon_sim, salmon_sim_args) simulated_use <- simulated simulated_use[nrow(simulated_use), R_switch] = NA # Ensure no # knowledge of it for pbsEDM(), so it # doesn't affect the rho values (else it does) fit <- do.call(pbsEDM::pbsEDM, c(list(N = simulated_use), pbsEDM_args)) to_return <- list(simulated = simulated, fit = fit) # concatenated list, with simulated tibble first then list of all results from pbsEDM return(to_return) # for now, probably want to scale down or have option to # just return basic results for when doing many simulations } # sim_and_fit_realisations(), part of: if(edm_fit){ fit_edm <- do.call(pbsEDM::pbsEDM, c(list(N = all_sims[[m]]), pbsEDM_args)) testthat::expect_equal(dplyr::pull(all_sims[[m]], R_switch), fit_edm$N_observed[-(T+1)]) # Extra check fit_edm_full_series[m, ] <- t(c(m, fit_edm$N_forecast)) res_realisations[m, "R_switch_T_edm_fit"] = fit_edm$N_forecast[T] # TODO # double check what to do when pbsedm # arguments change res_realisations[m, "E"] = fit_edm$results$E # Though will need specific # lags also kept track of or specified res_realisations[m, "N_rho"] = fit_edm$results$N_rho res_realisations[m, "N_rmse"] = fit_edm$results$N_rmse res_realisations[m, "X_rho"] = fit_edm$results$X_rho res_realisations[m, "X_rmse"] = fit_edm$results$X_rmse }
First just try a simple call to single_view_embedding
, and based on the top
function above, use
lags_use <- list(# R_t = 0, # not for single_view I think S_t = 0:3)
Here's what Luke had for Larkin, then try that for mine
# Larkin if(TRUE){ f0 <- pbsEDM::single_view_embedding( data = larkin, response = "recruits", lags = list(spawners = 0:8), index = 60, buffer = 10, window = integer(0), metric = "rmse", beyond = FALSE ) f0 %>% as.data.frame() }
Now try that for one of our examples, then dig into code with browser()
to
trace what is being done. Then change arguments.
index_val <- 60 buffer_val <- 10 sve_on_simulated <- single_view_embedding(data = simulated, response = "R_t", lags = lags_use, index = index_val, # not sure what this is buffer = buffer_val, # number of forecasts # prior to index, not sure exatly # what that means window = integer(0), # forecast # metric moving window width, aha - # our issue with before and after I # think maybe metric = "rmse", # presumably the # metric to test something beyond = FALSE) sve_on_simulated %>% as.data.frame() summary(sve_on_simulated)
Looks like it's only using information up to that point in time, going by the points column. Don't think we need that (i.e. use all the data) as we're going to test performance on the $T+1$ point. Deconstruct the function to understand methods (see later for simpler example):
ssr <- state_space_reconstruction(data = simulated, response = "R_t", lags = lags_use) # deconstructing later, # realised it doesn't do first-differencing head(simulated) head(ssr)
Returns centred and scaled values. Have gone through function.
distances <- state_space_distances(ssr, index = index_val, # time index of the first value # to forecast - but why would you do # this? Ignore a transient, or a library # thing? TODO buffer = buffer_val) # number of values to forecast # before index. Not sure why, vaguely # remember a discussion about it. TODO # From single_view_embedding() for this (changing X to ssr, X_distance to distances) # - Rows in X are points in the SSR # - Each row in X_distance corresponds to a focal point in the SSR # - Each column in X_distance corresponds to a potential neighbour in the SSR # - Elements of X_distance correspond to distances to neighbours # - NA elements indicate disallowed neighbours for a given focal point as_tibble(distances) summary(distances) # First three and last columns always NA's. Rest have at # least 49. tail(distances) # Next: # Compute centred and scaled forecasts --------------------------------------- # - Create neighbour index matrix # - Create neighbour matrices # - Project neighbour matrices # - Compute ssr_forecast vector ssr_forecasts <- state_space_forecasts(ssr, distances, beyond = FALSE) ssr_forecasts observed <- c(dplyr::pull(simulated, "R_t"), NA)[seq_along(ssr_forecasts)] # Not sure what NA does, gets # ignored anyway. observed # vector of observations forecast <- untransform_forecasts(observed, ssr_forecasts) # scale back to # non-normalised rows <- seq_along(forecast) # This is what single_view_embedding() returns, set is being defined here: sve_on_simulated_manually <- tibble(set = rep(0:1, c(index_val - 1, nrow(simulated) - index_val + 2))[rows], time = seq_len(nrow(simulated) + 1L)[rows], points = c(0, as.vector(apply(distances, 1, function (x) sum(!is.na(x)))))[rows], dim = rep(ncol(ssr), nrow(simulated) + 1L)[rows], observed = observed, forecast = forecast, forecast_metrics(observed, forecast, integer(0), "rmse"), # 0 gives all NA's, but # integer(0) does not; # strange. Not quite sure # what these are. superset_columns(simulated, lags_use, NULL)) superset_columns(simulated, lags_use, NULL) # so creates extra columns, all 1's # though, not sure of the point. expect_equal(sve_on_simulated, sve_on_simulated_manually) # Error: `sve_on_simulated` not equal to `sve_on_simulated_manually`. # Names: 4 string mismatches # Length mismatch: comparison on first 12 components # Component "mre": Modes: numeric, logical # Component "mre": target is numeric, current is logical # Component "rmse": Modes: numeric, logical # Component "rmse": target is numeric, current is logical
Use as template when get to multiview_embedding
. First, Luke's Larkin call:
# Larkin if(FALSE){ f01 <- pbsEDM::multiview_embedding( data = larkin, response = "recruits", lags = list(spawners = 0:8), index = 60, buffer = 10, window = integer(0), metric = "rmse", beyond = FALSE, weight = NULL, n_weight = 1, cores = NULL) f01$ranks # 51,000 rows! f01$summary f01$hindsight f01$results f01$forecast }
Now to do on our simulated values, using similar call:
lags_use_multi <- list(R_t = 0, S_t = 0:3) # TODO Think that may not make sense, as Luke always has a response variable as # well. So you shouldn't use R_t for both, presumably. mve_on_simulated <- multiview_embedding(data = simulated, response = "R_t", lags = lags_use_multi, # when I had this as lags_use, # first six non-NA forecast values were # same as for lags_use_multi, but not the rest index = index_val, buffer = buffer_val, window = integer(0), metric = "rmse", beyond = FALSE, weight = NULL, n_weight = 1, cores = NULL) mve_on_simulated
Deconstruct multiview_embedding()
to understand the steps.
lags_use subset_lags <- create_subset_lags(lags_use) subset_lags
So that gives every combination, including some without any 0-lags. That's just univariate, so try multi:
subset_lags_multi <- create_subset_lags(lags_use_multi) # Order is not intuitive, but looks to cover them all (took a while to figure # out formatting) subset_lags_multi
Now do the calcs:
mve_forecasts <- lapply( subset_lags_multi, FUN = single_view_embedding, data = simulated, response = "R_t", index = index_val, buffer = buffer_val, window = integer(0), metric = "rmse", beyond = FALSE, superset = lags_use_multi) weighted <- weight_single_view_embeddings(mve_forecasts, "rmse", weight = NULL, n_weight = 1) weighted mve_on_simulated_manually <- structure( list( data = simulated, observed = c(dplyr::pull(simulated, "R_t"), NA), forecast = c(rep(NA_real_, index_val - 1), weighted$results$forecast), response = "R_t", lags = lags_use_multi, index = index_val, buffer = buffer_val, window = integer(0), metric = "rmse", beyond = FALSE, n_weight = 1, raw_forecasts = mve_forecasts, ranks = weighted$ranks, summary = weighted$summary, hindsight = weighted$hindsight, results = weighted$results), class = "multiview_embedding") expect_equal(mve_on_simulated, mve_on_simulated_manually) tail(mve_on_simulated$results)
Next - don't think this actually gives the best forecast for the next step. Does return the values for each view.
So the first combination of lags is (currently) just R_t
with lag of 0, which
is the response variable, so presumably the results come out really well, or
just a bunch of NA's. No - was thinking wrong. See #16 notes and below
subset_lags_multi[[1]] mve_on_simulated$raw_forecasts[[1]] %>% as.data.frame()
Actually, it shouldn't be perfect should it, as we're excluding the $t^*$
value. Hence it's using the next nearest neighbour in 1-d R_t
space, which is
obviously not itself. This is all fine then -- the 1-d space is likely a poor
representation of the dynamics. So we can include lag of 0.
So, for single_view_embedding
figure out what the following do:
So:
index
value, it gets used in state_space_distances()
in:# Exclude focal points in the training set ----------------------------------- distances[seq_len(index - buffer - 1L), ] <- NA_real_
We want to do leave-one-out instead, so maybe just don't need this at all? Have
done this in state_space_distances_for_sve()
and then
state_space_forecasts_for_sve()
; following is figuring out how to do those
and then doing them.
Also want to change
# Exclude focal point and future neighbours ---------------------------------- distances[upper.tri(distances, diag = TRUE)] <- NA_real_
to just exclude the focal point -- we are okay with having future neighbours, as is the usual situation in EDM.
Go through each stage of state_space_distances()
, checking what happens
as we go along. Above we had
# distances <- state_space_distances(ssr, # index = index_val, # buffer = buffer_val) distances_1 <- state_space_distances(ssr, index = 60, # 0 gives error buffer = 1) # 0 gives error, not sure what # this really means yet
Do smaller one to understand:
# Above we defined # ssr <- state_space_reconstruction(data = simulated, # response = "R_t", # lags = lags_use) # = S_t of 0, 1, 2, 3 ssr_small <- ssr[1:10, ] ssr_small distances_small <- state_space_distances(ssr_small, index = 2, # 0 gives error buffer = 1) # 0 gives error, not sure what distances_small # Now adapted to not have index or buffer, and return symmetric matrix not lower # triangular (plus extra NA's) distances_small_for_sve <- state_space_distances_for_sve(ssr_small) distances_small_for_sve
So, first three points of ssr_small
do not exist because of the lags. Carrying on:
ssr_forecasts_small <- state_space_forecasts(ssr_small, distances_small, beyond = FALSE) ssr_forecasts_small ssr_forecasts_small_for_sve <- state_space_forecasts_for_sve(ssr_small, distances_small_for_sve) ssr_forecasts_small_for_sve
# Maybe change name here sve_on_simulated_2 <- single_view_embedding(data = simulated, response = "R_t", lags = lags_use, index = index_val, # not sure what this is buffer = buffer_val, # number of forecasts # prior to index, not sure exatly # what that means window = integer(0), # forecast # metric moving window width, aha - # our issue with before and after I # think maybe metric = "rmse", # presumably the # metric to test something beyond = FALSE) sve_on_simulated %>% as.data.frame() summary(sve_on_simulated)
Realised that state_space_reconstruction()
doesn't do what we say in Appendix
of first manuscript. So deconstructing it and writing state_space_reconstruction_for_sve()
.
Do first 15 of rows of simulated
, starting fresh here:
simulated_small_2 <- simulated[1:15, ] # Above had ssr <- state_space_reconstruction(data = simulated, response = "R_t", lags = lags_use) # deconstructing later, # realised it doesn't do first-differencing lags_use_multi ssr_small_2_orig_fun <- state_space_reconstruction(data = simulated_small_2, response = "R_t", lags = lags_use_multi) # Now new function that also does first-differencing; am digging into to check I # understand, then including first differening early on in this new one: ssr_small_2 <- state_space_reconstruction_for_sve(data = simulated_small_2, response = "R_t", lags = lags_use_multi) # Okay, in the function it puts the response variable first as "R_t_s" (I added # s for scaled), but if # we've prescribed a lag of 0 for "R_t" you get an identical column but labelled # "R_t_s_0". BUT, not all state spaces will include a zero lag of "R_t". Stick # with the response column for now - I think it may be slightly inefficient # (could be calculated once for all ssr's, rather than for each ssr), but we do # need the scaled version calculated (can also check on the non-scaled predictions). # Example of ssr without repeated first column: state_space_reconstruction_for_sve(data = simulated_small_2, response = "R_t", lags = list(R_t = 1, S_t = 0:3)) %>% head() head(ssr_small_2)
So single_view_embedding_for_sve()
[cumbersome name but consistent for now
with what I've been doing] now being created as I go along, based on
single_view_embedding()
. This is to help write the function.
Starting fresh (so can just run from here):
set.seed(42) # Have realised the correlation issue betwen S_t and R_t so modifying code # and rerunning from here. h_simulated <- 0.1095 + sample(1:180) * 0.001 # has mean of 0.2 simulated_4 <- EDMsimulate::salmon_sim(h = h_simulated) lags_use_multi <- list(R_t = 0, S_t = 0:3) # TODO maybe: not sure why this failed a while ago, might be the use of random numbers # expect_equal(EDMsimulate::salmon_sim()[1:15, ], EDMsimulate::salmon_sim(T = 15)) # Redoing later with simulation with changing harvest rate. And change error in # `state_space_reconstruction_for_sve.R` to something more useful. simulated_small_4 <- simulated_4[1:15, ] # TODO need different lags to not return an NA ssr_new <- state_space_reconstruction_for_sve(data = simulated_small_4, lags = lags_use_multi) ssr_new # Has the correct NA's in top three rows and final row response_s <- state_space_reconstruction_for_sve(data = simulated_small_4, response_only = TRUE, lags = list(R_t = 0)) response_s # Just what we're considering as the response variable, properly # labelled; don't have a first-differenced value for T (as no # T+1). This value is what we want to forecast. # Calculate the distances between all points distances_new <- state_space_distances_for_sve(ssr_new) distances_new # Note that column 14 is not filled because it projects to the undefined row 15, # so no point in having it as a neighbour. Row 14 is filled because 14 can be a # valid t_star. TODO this doesn't happen for later example with only S_t = 1 lag # and R_t as response. prediction_indices <- state_space_forecasts_for_sve(ssr_new, distances_new, max_lag = max(unlist(lags_use_multi, use.names = FALSE))) # Those are the indices to use the make the prediction of where each t_star will # go for t_star+1. # Take off the final one because we are shifting (predicting t_star+1 from the # prediction indices for each t_star); need NA at beginning. response_s_predicted <- c(NA, response_s[prediction_indices[-length(prediction_indices)]]) response_s_predicted rho_response_s <- cor(response_s, response_s_predicted, use = "pairwise.complete.obs") %>% as.vector() rho_response_s
Can also calculate $\rho$ on the untransformed original response variable.
Now untransform the scaling and first differencing.
N_observed <- simulated_small_4$"R_t" N_predicted <- untransform_predictions(N_observed = N_observed, Y_predicted = response_s_predicted, max_lag = max(unlist(lags_use_multi, use.names = FALSE))) rho_orig <- cor(N_observed, N_predicted[-length(N_predicted)], use = "pairwise.complete.obs") rho_orig # Low, but only a few points to get things working # Leave the rho and other calculations to outside of the function, though do # need the rho for the state space.
Then put that into single_view_embedding_for_sve()
, work out what should be
returned, and then call the proper thing from the relevant mve function.
sve_predicted_for_sve <- single_view_embedding_for_sve( data = simulated_small_4, response = "R_t", lags = lags_use_multi) sve_predicted_for_sve rho_orig_from_function <- cor(sve_predicted_for_sve$R_t, sve_predicted_for_sve$R_t_predicted, use = "pairwise.complete.obs") expect_equal(rho_orig, rho_orig_from_function) rho_response_s_from_function <- cor(sve_predicted_for_sve$R_t_s, sve_predicted_for_sve$R_t_s_predicted, use = "pairwise.complete.obs") rho_response_s_from_function expect_equal(rho_response_s, rho_response_s_from_function)
multiview_embedding()
Keeping the function name multiview_embedding()
as that will be called from
elsewhere (keeping my other new ones as ..._for_sve()
though). Saved Luke's as
..._original()
.
# Doing this to work out what failed simulated <- EDMsimulate::salmon_sim() # HERE sve_this_failed <- single_view_embedding_for_sve( data = simulated[1:15, ], # Did fail, but figured out below and no longer does. response = "R_t", lags = list(S_t = 1)) # Originally failed with: # Error in untransform_predictions(N_observed = response_observed, Y_predicted = response_s_predicted),: !any(is.na(Y_predicted[(min_t_Y_predicted):length(Y_predicted)])) is not TRUE # Originally hadn't had NA's in final column of distances, or final # row. Thought it was meant to. Deconstructing, it just needs them in final column: ssr_5 <- state_space_reconstruction_for_sve(data = simulated[1:15, ], lags = list(S_t = 1)) ssr_5 # So just the S_t_s_1 column, which is correct. distances_5 <- state_space_distances_for_sve(ssr_5) distances_5 # Did not have NA's in final columns of distances, but have now # updated state_space_distances_for_sve() to put NA's in final # column - cannot have a nearest neighbour being the final time # point, as we do not know where it goes as where it goes does not # exist (I previously just used the condition that # where-it-goes has NA). Final point is well defined (since has no # lag 0, so all the values are included), but just # can't be a neighbour. sve_this_worked <- single_view_embedding_for_sve( data = simulated_small_4, response = "R_t", lags = list(S_t = 1)) sve_this_worked # Gives negative forecast though # And this to work on correlation issue above sve_use_multi <- single_view_embedding_for_sve( data = simulated_small_4, response = "R_t", lags = lags_use_multi) sve_use_multi # All R_t_predicted come out negative !? # Try the longer time series, only effectively using around 13 candidate # neighbours in that one. sve_simulated_4 <- single_view_embedding_for_sve( data = simulated_4, response = "R_t", lags = lags_use_multi) sve_simulated_4 # Can see line 28 to 29 has a jump in R_t_observed of 14.2, and R_t_s[28] is # 4.23 (considers jump from 28 to 29, then rescaled). R_t_s[29] is -3.43 as it # comes down again. But can see that t* = 35 gives a big jump, but then doesn't # come back down again. Aha - since we're not using the realistic lags yet I # think, only R_t so no detection of R_t lags (a big jump always followed by a # big crash). (bottom right arm of Figure 1(d) in manuscript 1). But we're not # allowing, in this single view embedding, such lags of R_t. Presumably this # will just come out as a very poor ssr and get dumped.
lags_St2 <- list(S_t = 2) # This fails (during multiview below) because I didn't disallow all the nearest # neighbours at the end, was only thinking about no lag 0, not no lag 0 or lag # 1. So need to consider the minimum lag in state_space_distances_for_ sve_simulated_St2 <- single_view_embedding_for_sve( data = simulated_4, response = "R_t", lags = lags_St2) # Redid this one and it ran okay, numbers look off though. # And this for testing also sve_simulated_lags_many <- single_view_embedding_for_sve( data = simulated_4, response = "R_t", lags = list(R_t = 0:4, S_t = 0:3))
This fails - see browser() in function. Fails for S_t with lag of 2; must be next one I think, as one above runs - need to figure out this DID THIS: need to repeat the fix I just did, have to disallow max_lag -ish candidate nearest neighbours at the end that have undefined projections. See in state_space_distances_for_sve() - just tidy up the example. mve_small <- multiview_embedding(data = simulated_small_4, response = "R_t", lags = lags_use_multi) # Earlier example was for simulated_4 not simulated_small_4; expect latter is # just running out of lags kind of thing. Do explicitly: # This now does not have 15 in prediction_indices, having fixed. Not obvious why # the error arose with untransform_predictions(). sve_simulated_small_St2 <- single_view_embedding_for_sve( data = simulated_small_4, response = "R_t", lags = lags_St2) sve_simulated_small_St2
That runs but thought had something wrong, as R_t_predicted
should only be values of
R_t
, but that's only true if not doing first-differencing and scaling I
think. diff(sve_simulated_small_St2$R_t_predicted)
correctly has repeated
values (i.e. first differences are repated).
Had thought max_lag
in untransform_predictions()
was wrong, and
maybe depend on
response only, but pretty sure it is okay.Given the results are so squiffy,
there must be something wrong in untransform_predictions()
. BUT, this is only
a very short time series so likely to not be good. Retry with the longer one:
sve_simulated_St2_again <- single_view_embedding_for_sve( data = simulated_4, response = "R_t", lags = lags_St2) sve_simulated_St2_again
Runs okay, numbers not great but don't expect them to be, as only using lag of 2
for S_t
. Retry the multiview again:
mve_small_again <- multiview_embedding(data = simulated_small_4, response = "R_t", lags = lags_use_multi)
That runs until errors, but seems to loop through everything. BUT, results shouldn't be too great, given the short data set and short lags.
How about full shebang:
set.seed(42) h_simulated <- 0.1095 + sample(1:180) * 0.001 # has mean of 0.2 simulated_4 <- EDMsimulate::salmon_sim(h = h_simulated) lags_use_multi <- list(R_t = 0, S_t = 0:3) lags_lots <- list(R_t = 0:4, S_t = 0:8) mve_full_shebang <- multiview_embedding(data = simulated_4, response = "R_t", lags = lags_lots) # Currently returns: # return(list(rho_each_subset = rho_each_subset, # rho_s_each_subset = rho_s_each_subset)) # Doing quickly: sqrt_length <- round(sqrt(length(mve_full_shebang[[1]]))) # approx 2^(N/2) I think rho_each_subset <- mve_full_shebang[[1]] top_rho <- sort(rho_each_subset, decreasing = TRUE)[1:sqrt_length] top_rho rho_s_each_subset <- mve_full_shebang[[2]] top_rho_s <- sort(rho_s_each_subset, decreasing = TRUE)[1:sqrt_length] top_rho_s
That gave top top_rho
value of 0.597 (and top_rho_s
0.592 I think) when I was incorrectly using response_predicted
in untransform_predictions()
instead of response_observed
(i.e. add the
first differences to the known observation, which we know (!), rather than the
predicted, which can compound errors). So re-running gives:
After correcting, top_rho
max is only 0.36 (i.e. lower?!), while top
top_rho_s
is 0.67. Need to step back and do for a single view.
sve_7 <- single_view_embedding_for_sve( data = simulated_4, response = "R_t", lags = lags_lots) sve_7_rho <- cor(sve_7$R_t, sve_7$R_t_predicted, use = "pairwise.complete.obs") sve_7_rho # 0.173301 before changing unstransform_predictions() to set negative predictions # to the minimum observed. # Once done that the value is the same??? Should be a bit better you'd # think. Changed to a high value (200) instead of minimum and it did change rho # to negative, so code is changing properly. Changes to 0.17277 when I make it # 10* minimum, so code is working properly, just that change surprisingly made # no difference (for this example).
Try small multiview again:
mve_small_again_2 <- multiview_embedding(data = simulated_small_4, response = "R_t", lags = lags_use_multi) mve_small_again_2
Can start from here.
It's all working okay. Try shebang one again with longer simulated data and more lags. Though the actual top rho values shouldn't change from shebang; the values are calculated in the function now.
set.seed(42) h_simulated <- 0.1095 + sample(1:180) * 0.001 # has mean of 0.2 simulated_4 <- EDMsimulate::salmon_sim(h = h_simulated) lags_use_multi <- list(R_t = 0, S_t = 0:3) lags_lots <- list(R_t = 0:4, S_t = 0:8) mve_full_shebang_2 <- multiview_embedding(data = simulated_4, response = "R_t", lags = lags_lots) # Do lags_use_multi # for shorter and quicker one mve_full_shebang_2$rho_prediction_from_mve mve_full_shebang_2$rho_each_top_subset
Wow, fascinating. The overall $\rho$ for mve, averaging over the top subsets, is higher than every $\rho$ from every subset! So the averaging is indeed paying off, and improving results. Other results are:
mve_full_shebang_2$lags_of_top_subsets # These are in order of highest rho mve_full_shebang_2$response_predicted_from_mve # Overall mve prediction, including # for the forecast T+1.
Can now move onto including this into sim_and_fit_realisations()
in EDMsimulate.
Still having issues in EDMsimulate (see issue 17).
set.seed(42) simulated_5 <- EDMsimulate::salmon_sim() # Back to default h_t, not noisy, # to see if that's an issue. Shouldn't # be. BUT is! mve_args_manual = list(lags = list(R_t = 0:1, S_t = 0:3)) R_switch_manual <- "R_t" fit_mve_manual_5 <- do.call(pbsEDM::multiview_embedding, c(list(data = simulated_5, response = R_switch_manual), mve_args_manual))
Aha - it doesn't work with h_t
constant (NULL to give default) in
salmon_sim()
, but does for simulated_4
with h_t
varying.
HERE HERE See fit_models help, have been fixing things and made new tests.
TODO Maybe - could do with a function to put the list of (top) lags into a simple table. Maybe. function for size of subset_lags, just needs to be based on total number of lags. Will need in function (as a check) anyway. Isnt it 2^N - 1 ish. Yes. Either in or out.
Think Andy and Carrie have been not on the same page about what is being forecast -- will chat Tuesday and resolve. TODO
We will use R_t
with lags of 1, 2, and 3. Andy thinks now do want lag 0
also, see Issue 16. Check our thinking of $t^*$ is the same - discuss as it
isn't
S_t
with lags 1 to 7. When we need the forecast for R_t
we don't yet know
S_t
because that is calculated at the end of the year.
Number of 5-dimensional (half the max) combinations is
choose(10, 1:10) choose(10, 1:10) %>% sum() choose(10, 1:10) %>% sum() %>% sqrt()
So that's manageable.
Should calculate $\rho$ only on what we care about, i.e. R_t
.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.