Smoothing

knitr::opts_chunk$set(echo = FALSE, Warning = FALSE, message = FALSE)

Introduction ---------------------------------------------------------------

Overview of Smoothing

In this vignette, we will explore the smoothing techniques supported in the Covid19Wastewater package. Smoothing is a widely used technique in data analysis and visualization that helps reduce noise, reveal underlying trends, and highlight important patterns in the data. Covid19Wastewater provides several powerful functions to perform smoothing operations, and this vignette will guide you through their usage and showcase their capabilities.

Data

We will be using the long range Madison Covid data in the vignettes with the occasional use of other data to show a more complete picture of the different smoothing options.This data set has a clear trend but a ton of noise.

library(Covid19Wastewater)
library(dplyr)
library(ggplot2)

data("Example_data", package = "Covid19Wastewater")

smoothing_df <- Example_data%>%
  select(site, date, N1, N2)%>%
  filter(N1 != 0, N2 != 0)%>%
  mutate(N1 = log(N1), N2 = log(N2))

base_plot <- smoothing_df%>%
  ggplot(aes(x = date))+
  geom_point(aes(y = N1))+
  facet_wrap(~site)+
  theme(plot.title = element_text(hjust = 0.5),
        axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+
  labs(y = "Covid-19 Gene Concentration",
       x = "Date")

base_plot

Smoothing Techniques --------------------------------------------------------

Each smoothing function shares the same framework. The first three arguments are the dataframe, the name of the column to be smoothed, and the name of the column the smoothing is placed. The last argument "Filter" control if some data points should be removed before the filtering happening. Other arguments are parameters for how the smoothing should be done.

Savitzky-Golay Filter

The Savitzky-Golay filter is a widely used smoothing technique that performs local polynomial regression on the data. It provides excellent noise reduction while preserving important features such as peaks and valleys. The Covid19Wastewater package includes the sgolaySmoothMod() function, which allows you to apply this filter to your data. sgolaySmoothMod() has 2 parameters. "poly" controls the degree of the polynomial fit on the local data whereas "n" control the number of points considered "local". by default its set to "guess" which uses the density of the data to make guesses about the ideal n.

# Example code demonstrating sgolaySmoothMod() function
sgolay_smooth <- sgolaySmoothMod(smoothing_df, "N1", "N1_sgolay", poly=5,n="guess", Filter = NULL)

sgolay_plot <- base_plot + 
  geom_line(aes(y = N1_sgolay, color = "Default Savitzky-Golay Filter"), data = sgolay_smooth, linewidth = 1.25)

param_sgolay_smooth <- Covid19Wastewater::sgolaySmoothMod(sgolay_smooth, "N1", "N1_sgolay_2", poly=2, n = "guess", Filter = NULL)

alt_plot <- sgolay_plot + 
  geom_line(aes(y = N1_sgolay_2, color = "Polynomial 1"), data = param_sgolay_smooth, linewidth = 1.25)


param_sgolay_smooth_2 <- Covid19Wastewater::sgolaySmoothMod(sgolay_smooth, "N1", "N1_sgolay_3", poly = 5, n = 31, Filter = NULL)


show_plot <- alt_plot + 
  geom_line(aes(y = N1_sgolay_3, color = "N 31"), data = param_sgolay_smooth_2, linewidth = 1.25)

show_plot+
  theme(plot.title = element_text(hjust = 0.5),
        axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+
  labs(y = "Covid-19 Gene Concentration",
       x = "Date",
       title = "Savitzky-Golay Smoothing With Diffrent Parameters",
       color = "Smoothing Parameter"
       )

Loess Smoothing

Loess (locally estimated scatterplot smoothing) is a non-parametric regression technique that fits a smooth curve to the data by locally weighted polynomial regression. It is particularly useful when dealing with complex or nonlinear relationships. The Covid19Wastewater package provides the loessSmoothMod() function to perform loess smoothing on your data. The only parameter is "span" which controls what portion of the data is looked at. This is set to "guess" by default where the package picks what it thinks is optimal based on data density.

# Example code demonstrating loessSmoothMod() function
#sgolay_smooth <- mutate(sgolay_smooth, N1 = exp(N1))
loess_smooth <- loessSmoothMod(DF = sgolay_smooth,
                               InVar = "N1",
                               OutVar ="N1_loess",
                               Span = "guess",
                               Filter = NULL)

all_smooth_plot <- sgolay_plot + geom_line(aes(y = N1_loess, color = "Default Loess"), data = loess_smooth, size = 1.25)

loess_plot <- base_plot + geom_line(aes(y = N1_loess, color = "Default Loess"), data = loess_smooth, size = 1.25)

param_loess_smooth <- loessSmoothMod(loess_smooth, "N1", "N1_loess_2", Span = .1, Filter = NULL)

alt_plot <- loess_plot + geom_line(aes(y = N1_loess_2, color = "Span .1 Loess"), data = param_loess_smooth, size = 1.25)


param_loess_smooth_2 <- loessSmoothMod(loess_smooth, "N1", "N1_loess_3", Span= .2, Filter = NULL)


show_plot <- alt_plot + geom_line(aes(y = N1_loess_3, color = "Span .2 Loess"), data = param_loess_smooth_2, size = 1.25)

show_plot+
  theme(plot.title = element_text(hjust = 0.5),
        axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+
  labs(y = "Covid-19 Gene Concentration",
       x = "Date",
       title = "Loess Smoothing With Diffrent Parameters",
       color = "Smoothing Parameter"
       )

Exponential Smoothing

Exponential smoothing is a technique used to smooth time series data by assigning exponentially decreasing weights to observations. It is particularly useful for capturing trends and seasonality in time series data.It has two parameters alpha and beta which control the change of the weights.

# Example code demonstrating expSmoothMod() function
exp_smooth <- Covid19Wastewater::expSmoothMod(loess_smooth, "N1", "N1_exp", Filter = NULL)

all_smooth_plot <- all_smooth_plot + 
  geom_line(aes(y = N1_exp, color = "Default Exponential"), data = exp_smooth, size = 1.25)

exp_plot <- base_plot + 
  geom_line(aes(y = N1_exp, color = "Default Exponential"), data = exp_smooth, size = 1.25)

param_exp_smooth <- Covid19Wastewater::expSmoothMod(exp_smooth, "N1", "N1_exp_2", alpha = .3, Filter = NULL)

alt_plot <- exp_plot + geom_line(aes(y = N1_exp_2, color = "Alpha .3"), data = param_exp_smooth, size = 1.25)


param_exp_smooth_2 <- Covid19Wastewater::expSmoothMod(exp_smooth, "N1", "N1_exp_3", beta = .05, Filter = NULL)


show_plot <- alt_plot + geom_line(aes(y = N1_exp_3, color = "Beta .05"), data = param_exp_smooth_2, size = 1.25)

show_plot+
  labs(y = "Covid-19 Gene Concentration",
       x = "Date",
       title = "Exponential Smoothing With Diffrent Parameters",
       color = "Smoothing Parameter"
       )

Everything

Finaly this shows each of the methods with the default parameters on the same dataset. This shows that exponential smoothing has a systemic lag and loess by default is a little less stable.

all_smooth_plot


Try the Covid19Wastewater package in your browser

Any scripts or data that you put into this service are public.

Covid19Wastewater documentation built on Aug. 25, 2023, 1:07 a.m.