knitr::opts_chunk$set( collapse = TRUE, comment = "#>", dev = "png", # fig.path = "figures/viz-", fig.height = 5, fig.width = 7 )
The package tscv provides a set of helper functions for time series analysis, forecasting and time series cross-validation. In addition to functions for splitting data and evaluating forecasts, the package contains several visualization functions that are useful for exploratory time series analysis.
This vignette demonstrates selected plotting functions from tscv using hourly day-ahead electricity spot prices.
You can install the development version from GitHub with:
# install.packages("devtools") devtools::install_github("ahaeusser/tscv")
# Load relevant packages library(tscv) library(tidyverse) library(tsibble)
Sys.setlocale("LC_TIME", "C")
The data set elec_price is a tibble with day-ahead electricity spot prices in [EUR/MWh] from the ENTSO-E Transparency Platform. The data set contains hourly time series data from 2019-01-01 to 2020-12-31 for eight European bidding zones.
In this vignette, we use four bidding zones:
DE: Germany, including LuxembourgFR: FranceNO1: Norway 1, OsloSE1: Sweden 1, LuleaThe visualization functions in tscv work with data in long format. Therefore, we define a context object that identifies the relevant columns:
series_id: column identifying the individual time seriesvalue_id: column containing the numeric measurement variableindex_id: column containing the time indexseries_id = "bidding_zone" value_id = "value" index_id = "time" context <- list( series_id = series_id, value_id = value_id, index_id = index_id ) # Prepare data set main_frame <- elec_price %>% filter(bidding_zone %in% c("DE", "FR", "NO1", "SE1")) main_frame
Line charts are the most common visualization for time series data. They show how the observed values change over time and are useful for detecting trends, seasonal patterns, level shifts, outliers and periods of high volatility.
The function plot_line() creates line charts from data in long format. The first example creates a faceted plot, with one panel for each bidding zone.
# Example 1 ------------------------------------------------------------------- main_frame %>% plot_line( x = time, y = value, color = bidding_zone, facet_var = bidding_zone, title = "Day-ahead Electricity Spot Price", subtitle = "2019-01-01 to 2020-12-31", xlab = "Time", ylab = "[EUR/MWh]", caption = "Data: ENTSO-E Transparency" ) # Example 2 ------------------------------------------------------------------- main_frame %>% plot_line( x = time, y = value, color = bidding_zone, title = "Day-ahead Electricity Spot Price", subtitle = "2019-01-01 to 2020-12-31", xlab = "Time", ylab = "[EUR/MWh]", caption = "Data: ENTSO-E Transparency" )
The faceted version is useful when the individual time series have different levels or volatility. The combined version is useful for comparing the bidding zones directly in a single panel.
Bar charts can be used to display summary values by category or lag. In this example, we use plot_bar() to visualize the sample partial autocorrelation function.
The partial autocorrelation function measures the relationship between a time series and its lagged values after controlling for the intermediate lags. It is often used as an exploratory tool to identify relevant lag structures in time series models.
First, we estimate the sample partial autocorrelation function using estimate_pacf(). The argument lag_max = 30 computes the partial autocorrelation for lags 1 to 30.
# Estimate sample partial autocorrelation function corr_pacf <- estimate_pacf( .data = main_frame, context = context, lag_max = 30 ) corr_pacf # Visualize PACF as correlogram corr_pacf %>% plot_bar( x = lag, y = value, color = sign, facet_var = bidding_zone, position = "dodge", title = "Sample autocorrelation function", xlab = "Lag", ylab = "Correlation", caption = "Data: ENTSO-E Transparency" )
The resulting correlogram shows the estimated partial autocorrelation by lag and bidding zone. The variable sign indicates whether the absolute value of the estimated partial autocorrelation exceeds the approximate confidence bound used by estimate_pacf().
Distribution plots are useful for understanding the marginal distribution of the observed values. For electricity prices, this is particularly relevant because prices may show skewness, heavy tails, negative values or extreme spikes.
The following examples use histograms, density plots and QQ-plots to explore the distribution of hourly electricity prices across bidding zones.
Histograms show the frequency distribution of the observed values. They are useful for identifying the range, central tendency, skewness and outliers of a time series.
The first example overlays the distributions of the four bidding zones in one plot.
# Example 1 ------------------------------------------------------------------- main_frame %>% plot_histogram( x = value, color = bidding_zone, title = "Day-ahead Electricity Spot Price", xlab = "[EUR/MWh]", ylab = "Frequency", caption = "Data: ENTSO-E Transparency" ) # Example 2 ------------------------------------------------------------------- main_frame %>% plot_histogram( x = value, color = bidding_zone, facet_var = bidding_zone, facet_nrow = 1, title = "Day-ahead Electricity Spot Price", xlab = "[EUR/MWh]", ylab = "Frequency", caption = "Data: ENTSO-E Transparency" )
The faceted histogram separates the bidding zones into individual panels. This makes it easier to inspect the distribution of each time series separately, especially when the distributions overlap in the combined plot.
Density plots provide a smoothed version of the empirical distribution. Compared with histograms, they are often easier to use when comparing several distributions in one figure.
# Example 1 ------------------------------------------------------------------- main_frame %>% plot_density( x = value, color = bidding_zone, title = "Day-ahead Electricity Spot Price", xlab = "[EUR/MWh]", ylab = "Density", caption = "Data: ENTSO-E Transparency" ) # Example 2 ------------------------------------------------------------------- main_frame %>% plot_density( x = value, color = bidding_zone, facet_var = bidding_zone, facet_nrow = 1, title = "Day-ahead Electricity Spot Price", xlab = "[EUR/MWh]", ylab = "Density", caption = "Data: ENTSO-E Transparency" )
The combined density plot highlights differences between bidding zones in the location and spread of prices. The faceted version provides a clearer view of each individual distribution.
QQ-plots compare the empirical distribution of the observed values with a theoretical distribution, usually the normal distribution. They are useful for checking whether the data are approximately normally distributed.
For electricity prices, deviations from normality are common because prices can be skewed and may contain extreme values.
# Example 1 ------------------------------------------------------------------- main_frame %>% plot_qq( x = value, color = bidding_zone, title = "Day-ahead Electricity Spot Price", xlab = "Theoretical Quantile", ylab = "Sample Quantile", caption = "Data: ENTSO-E Transparency" ) # Example 2 ------------------------------------------------------------------- main_frame %>% plot_qq( x = value, color = bidding_zone, facet_var = bidding_zone, title = "Day-ahead Electricity Spot Price", xlab = "Theoretical Quantile", ylab = "Sample Quantile", caption = "Data: ENTSO-E Transparency" )
If the observations were approximately normally distributed, the points in the QQ-plot would lie close to a straight line. Strong deviations from this pattern indicate skewness, heavy tails or outliers.
This vignette demonstrated several visualization functions from tscv:
plot_line() for time series line chartsplot_bar() for bar charts, here used to visualize partial autocorrelationsplot_histogram() for histogramsplot_density() for density plotsplot_qq() for QQ-plotsTogether, these plots provide a useful starting point for exploratory time series analysis. Line charts help inspect the temporal structure of the data, while distribution plots and correlograms help identify features that may be relevant for modelling and forecasting.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.