MATSS-forecasting: Analysis and Synthesis of Forecasting Ecological Time Series

Forecasting Method Evaluation

Hao Ye 2019-07-30

# define where the cache is located
db <- DBI::dbConnect(RSQLite::SQLite(), here::here("output", "drake-cache.sqlite"))
cache <- storr::storr_dbi("datatable", "keystable", db)

# load results
loadd(full_results, cache = cache)

full_results

## # A tibble: 28 x 5
##    results          metadata       dataset           method        args    
##    <list>           <list>         <chr>             <chr>         <list>  
##  1 <df[,6] [1,770 … <named list [… data_salmon       autoarima_on… <list […
##  2 <df[,6] [4,175 … <named list [… data_RAMlegacy_c… autoarima_on… <list […
##  3 <df[,6] [3,587 … <named list [… data_RAMlegacy_s… autoarima_on… <list […
##  4 <df[,6] [3,219 … <named list [… data_RAMlegacy_r… autoarima_on… <list […
##  5 <df[,6] [1,083 … <named list [… data_Dorner2008   autoarima_on… <list […
##  6 <df[,6] [336 × … <named list [… data_SprSum_Col_… autoarima_on… <list […
##  7 <df[,6] [299 × … <named list [… data_PugSound_Ch… autoarima_on… <list […
##  8 <df[,5] [1,770 … <named list [… data_salmon       naive_one_st… <list […
##  9 <df[,5] [4,175 … <named list [… data_RAMlegacy_c… naive_one_st… <list […
## 10 <df[,5] [3,587 … <named list [… data_RAMlegacy_s… naive_one_st… <list […
## # … with 18 more rows

full_results is a tibble with 28 rows, corresponding to the combinations of different dataset and method.

First, let’s do some cleaning of the dataset names:

full_results <- full_results %>%
        mutate(dataset = sub("data_(.+)$", "\\1", dataset))

If we had all of the results in one long-table, that would allows us to then compute group summaries as we wish. Here, we can ignore the args column, since we don’t specify any optional arguments to the methods that we need to track.

Taking a look at the results and metadata columns:

head(full_results[[1, "results"]])

##   id observed predicted lower_CI upper_CI training_naive_error
## 1 62 10.18112  9.350246 8.005280 10.69521            0.5820469
## 2 62 10.22001 10.015382 8.646983 11.38378            0.6288405
## 3 62 10.93525 10.021401 8.672140 11.37066            0.6109633
## 4 62 10.80714 10.047511 8.684199 11.41082            0.6140301
## 5 62 10.59376 10.278505 8.947784 11.60923            0.6001467
## 6 62 10.12218 10.234645 8.919308 11.54998            0.5894032

head(full_results[[1, "metadata"]])

## $species_table
## # A tibble: 155 x 3
##       id species class         
##    <int> <fct>   <fct>         
##  1    62 Chinook Actinopterygii
##  2    63 Chinook Actinopterygii
##  3    64 Chinook Actinopterygii
##  4    65 Chinook Actinopterygii
##  5    66 Chinook Actinopterygii
##  6    67 Chinook Actinopterygii
##  7    68 Chinook Actinopterygii
##  8    69 Chinook Actinopterygii
##  9    70 Chinook Actinopterygii
## 10    71 Chinook Actinopterygii
## # … with 145 more rows
## 
## $timename
## [1] "year"

We might want the species information, so let’s join the species_table element of metadata to each results df:

# function to combine elements from the three columns
process_row <- function(results, metadata, dataset, method, args) {
    results %>%
        mutate(dataset = dataset, 
               method = method, 
               args = list(args)) %>%
        left_join(mutate(metadata$species_table, id = as.character(id)), 
                  by = "id")
}

# apply process_row to each dataset, then combine into a single tibble
results <- full_results %>%
    pmap(process_row) %>%
    bind_rows() %>%
    as_tibble()

To compute Mean Absolute Scaled Error, we use the definition from [@Hyndman_2019]:

[q_j = \frac{e_j}{\frac{1}{T-1}\sum_{t = 2}^T |y_t - y_t-1|}]

and since the denominator is already computed for us as training_naive_error, we need only compute observed - predicted to get (e_j).

results <- results %>%
    mutate(error = observed - predicted)

We then need to summarize over each set of predictions:

summary_results <- results %>%
    group_by(dataset, method, id) %>%
    summarize(MASE = mean(abs(error) / training_naive_error), 
              species = first(species), 
              class = first(class))

For each level of class, produce a histogram for frac_correct:

ggplot(data = summary_results, 
       mapping = aes(x = MASE, fill = class)) + 
    facet_grid(method~class, scales = "free_y") + 
    geom_density(position = "stack") + 
    geom_vline(aes(xintercept = 1), linetype = 2) + 
    theme_bw()

## Warning: Removed 48 rows containing non-finite values (stat_density).

DBI::dbDisconnect(db)

weecology/MATSS-forecasting documentation built on Nov. 28, 2020, 10:19 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

weecology/MATSS-forecasting
Analysis and Synthesis of Forecasting Ecological Time Series

reports/forecasting_comparison.md
In weecology/MATSS-forecasting: Analysis and Synthesis of Forecasting Ecological Time Series

Forecasting Method Evaluation

Read in the results

Examine the output structure

Merging results and metadata

Processing results

Plot

Cleanup

R Package Documentation

Browse R Packages

We want your feedback!

weecology/MATSS-forecasting Analysis and Synthesis of Forecasting Ecological Time Series

reports/forecasting_comparison.md In weecology/MATSS-forecasting: Analysis and Synthesis of Forecasting Ecological Time Series

Forecasting Method Evaluation

Read in the results

Examine the output structure

Merging results and metadata

Processing results

Plot

Cleanup

R Package Documentation

Browse R Packages

We want your feedback!

weecology/MATSS-forecasting
Analysis and Synthesis of Forecasting Ecological Time Series

reports/forecasting_comparison.md
In weecology/MATSS-forecasting: Analysis and Synthesis of Forecasting Ecological Time Series