filter_outlier: Identify and filter outliers

View source: R/filter.R

filter_outlierR Documentation

Identify and filter outliers

Description

The col values are processed through a median (i.e. low pass) filter to get .<col>_median, the squared differences between them .<col>_eps2, and its mean .<col>_sigma. A col value is considered an outlier if .<col>_eps2 is greater then .<col>_sigma.

Usage

filter_outlier(df, col, ksize, fill, keep = FALSE)

Arguments

df

a data frame of trajectory data

col

the variable to filter for outliers

ksize

the kernel size (must be odd)

fill

whether to substitute the outlier with the median

keep

whether to keep intermediate results

Details

This approach is inspired by (copied from ;-) the filter function in Xavier Olive's Python library traffic.

Value

a data frame with corrected outliers (if fill is TRUE). If keep is TRUE the intermediate columns .<col>_median, .<col>_eps2, .<col>_sigma and .<col>_outlier are included.

See Also

Other analysis: cumulative_distance(), cumulative_time(), extract_segment(), smooth_positions()

Examples

## Not run: 
library(readr)
library(dplyr)
library(anytime)
library(trrrj)
library(ggplot2)

ifile <- system.file("extdata", "belevingsvlucht.csv", package = "trrrj")
df <- readr::read_csv(ifile) %>%
  mutate(timestamp = anytime::anytime(time, tz = "UTC"))
df1 <- df %>%
  filter_outlier(col = baroaltitude, ksize = 17, fill = TRUE, keep = FALSE)
ggplot() +
  geom_line(data = df,  mapping = aes(x = timestamp, y = baroaltitude), colour = "blue") +
  geom_line(data = df1, mapping = aes(x = timestamp, y = baroaltitude), colour = "red")

## End(Not run)

euctrl-pru/trrrj documentation built on April 15, 2024, 1:24 p.m.