knitr::opts_chunk$set(echo = TRUE)
This package implements outlier identification and smoothing for non-linear and time series data.
install.packages('devtools') devtools::install_github('hoangtt1989/NOIS')
We generate a random sine curve and perturb it with outliers.
library(dplyr) library(ggplot2) library(NOIS) set.seed(123) npts <- 200 nout <- floor(.1*npts) xt <- seq(from=0, to=2*pi, length.out=npts) gaussnoise <- rnorm(npts) outliers <- sample(floor(npts/2):npts, size=nout) randpts <- runif(nout, min=5, max=7) yt <- sin(xt) + gaussnoise yt[outliers] <- yt[outliers] + randpts xt_outliers = xt[outliers] orig_func <- sin(xt) data <- data.frame(x = xt, y = yt, true_outlier = xt %in% xt_outliers) ggplot(data) + geom_point(aes(x, y, color = true_outlier))
We use NOIS to detect outliers and estimate the underlying curve. Inputs to this function include a data.frame
as well as strings specifying which columns contain "x" and "y" values. See the help documentation for NOIS_fit
for additional information about function inputs.
sine_fit <- NOIS_fit(data, x = 'x', y = 'y', CV_method = 'LOOCV', pool_q = nout) sine_fit sine_fit$fit_df
We plot the estimated curve and mark the detected outliers. NOIS correctly detects all of the outliers.
pt <- outlier_plot(sine_fit) pt
We use the predictive residuals bootstrap to construct confidence bands.
sine_BS <- NOIS_confint(sine_fit, conf_type = 'pred_BS', B = 500, conf_level = .05) sine_df <- sine_fit$fit_df %>% mutate(low_conf = sine_BS$low_predicted, up_conf = sine_BS$up_predicted) pt2 <- ggplot(sine_df) + geom_point(aes(x, y, color = outlier)) + geom_line(aes(x, bias_fit)) + geom_ribbon(aes(x = x, ymin = low_conf, ymax = up_conf), alpha = .5) pt2
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.