knitr::opts_chunk$set(echo = FALSE, Warning = FALSE, message = FALSE)
In this vignette, we will explore the smoothing techniques supported in the Covid19Wastewater package. Smoothing is a widely used technique in data analysis and visualization that helps reduce noise, reveal underlying trends, and highlight important patterns in the data. Covid19Wastewater provides several powerful functions to perform smoothing operations, and this vignette will guide you through their usage and showcase their capabilities.
We will be using the long range Madison Covid data in the vignettes with the occasional use of other data to show a more complete picture of the different smoothing options.This data set has a clear trend but a ton of noise.
library(Covid19Wastewater) library(dplyr) library(ggplot2) data("Example_data", package = "Covid19Wastewater") smoothing_df <- Example_data%>% select(site, date, N1, N2)%>% filter(N1 != 0, N2 != 0)%>% mutate(N1 = log(N1), N2 = log(N2)) base_plot <- smoothing_df%>% ggplot(aes(x = date))+ geom_point(aes(y = N1))+ facet_wrap(~site)+ theme(plot.title = element_text(hjust = 0.5), axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+ labs(y = "Covid-19 Gene Concentration", x = "Date") base_plot
Each smoothing function shares the same framework. The first three arguments are the dataframe, the name of the column to be smoothed, and the name of the column the smoothing is placed. The last argument "Filter" control if some data points should be removed before the filtering happening. Other arguments are parameters for how the smoothing should be done.
The Savitzky-Golay filter is a widely used smoothing technique that performs local polynomial regression on the data. It provides excellent noise reduction while preserving important features such as peaks and valleys. The Covid19Wastewater package includes the sgolaySmoothMod() function, which allows you to apply this filter to your data. sgolaySmoothMod() has 2 parameters. "poly" controls the degree of the polynomial fit on the local data whereas "n" control the number of points considered "local". by default its set to "guess" which uses the density of the data to make guesses about the ideal n.
# Example code demonstrating sgolaySmoothMod() function sgolay_smooth <- sgolaySmoothMod(smoothing_df, "N1", "N1_sgolay", poly=5,n="guess", Filter = NULL) sgolay_plot <- base_plot + geom_line(aes(y = N1_sgolay, color = "Default Savitzky-Golay Filter"), data = sgolay_smooth, linewidth = 1.25) param_sgolay_smooth <- Covid19Wastewater::sgolaySmoothMod(sgolay_smooth, "N1", "N1_sgolay_2", poly=2, n = "guess", Filter = NULL) alt_plot <- sgolay_plot + geom_line(aes(y = N1_sgolay_2, color = "Polynomial 1"), data = param_sgolay_smooth, linewidth = 1.25) param_sgolay_smooth_2 <- Covid19Wastewater::sgolaySmoothMod(sgolay_smooth, "N1", "N1_sgolay_3", poly = 5, n = 31, Filter = NULL) show_plot <- alt_plot + geom_line(aes(y = N1_sgolay_3, color = "N 31"), data = param_sgolay_smooth_2, linewidth = 1.25) show_plot+ theme(plot.title = element_text(hjust = 0.5), axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+ labs(y = "Covid-19 Gene Concentration", x = "Date", title = "Savitzky-Golay Smoothing With Diffrent Parameters", color = "Smoothing Parameter" )
Loess (locally estimated scatterplot smoothing) is a non-parametric regression technique that fits a smooth curve to the data by locally weighted polynomial regression. It is particularly useful when dealing with complex or nonlinear relationships. The Covid19Wastewater package provides the loessSmoothMod() function to perform loess smoothing on your data. The only parameter is "span" which controls what portion of the data is looked at. This is set to "guess" by default where the package picks what it thinks is optimal based on data density.
# Example code demonstrating loessSmoothMod() function #sgolay_smooth <- mutate(sgolay_smooth, N1 = exp(N1)) loess_smooth <- loessSmoothMod(DF = sgolay_smooth, InVar = "N1", OutVar ="N1_loess", Span = "guess", Filter = NULL) all_smooth_plot <- sgolay_plot + geom_line(aes(y = N1_loess, color = "Default Loess"), data = loess_smooth, size = 1.25) loess_plot <- base_plot + geom_line(aes(y = N1_loess, color = "Default Loess"), data = loess_smooth, size = 1.25) param_loess_smooth <- loessSmoothMod(loess_smooth, "N1", "N1_loess_2", Span = .1, Filter = NULL) alt_plot <- loess_plot + geom_line(aes(y = N1_loess_2, color = "Span .1 Loess"), data = param_loess_smooth, size = 1.25) param_loess_smooth_2 <- loessSmoothMod(loess_smooth, "N1", "N1_loess_3", Span= .2, Filter = NULL) show_plot <- alt_plot + geom_line(aes(y = N1_loess_3, color = "Span .2 Loess"), data = param_loess_smooth_2, size = 1.25) show_plot+ theme(plot.title = element_text(hjust = 0.5), axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+ labs(y = "Covid-19 Gene Concentration", x = "Date", title = "Loess Smoothing With Diffrent Parameters", color = "Smoothing Parameter" )
Exponential smoothing is a technique used to smooth time series data by assigning exponentially decreasing weights to observations. It is particularly useful for capturing trends and seasonality in time series data.It has two parameters alpha and beta which control the change of the weights.
# Example code demonstrating expSmoothMod() function exp_smooth <- Covid19Wastewater::expSmoothMod(loess_smooth, "N1", "N1_exp", Filter = NULL) all_smooth_plot <- all_smooth_plot + geom_line(aes(y = N1_exp, color = "Default Exponential"), data = exp_smooth, size = 1.25) exp_plot <- base_plot + geom_line(aes(y = N1_exp, color = "Default Exponential"), data = exp_smooth, size = 1.25) param_exp_smooth <- Covid19Wastewater::expSmoothMod(exp_smooth, "N1", "N1_exp_2", alpha = .3, Filter = NULL) alt_plot <- exp_plot + geom_line(aes(y = N1_exp_2, color = "Alpha .3"), data = param_exp_smooth, size = 1.25) param_exp_smooth_2 <- Covid19Wastewater::expSmoothMod(exp_smooth, "N1", "N1_exp_3", beta = .05, Filter = NULL) show_plot <- alt_plot + geom_line(aes(y = N1_exp_3, color = "Beta .05"), data = param_exp_smooth_2, size = 1.25) show_plot+ labs(y = "Covid-19 Gene Concentration", x = "Date", title = "Exponential Smoothing With Diffrent Parameters", color = "Smoothing Parameter" )
Finaly this shows each of the methods with the default parameters on the same dataset. This shows that exponential smoothing has a systemic lag and loess by default is a little less stable.
all_smooth_plot
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.