pat_sample: Sample PurpleAir time series data

Description Usage Arguments Details Value Examples

View source: R/pat_sample.R

Description

A sampling function that accepts PurpleAir timeseries dataframes and reduces them by randomly selecting distinct rows of the users chosen size.

If both sampleSize and sampleFraction are unspecified, sampleSize = 5000 will be used.

Usage

1
2
3
4
5
6
7
pat_sample(
  pat = NULL,
  sampleSize = NULL,
  sampleFraction = NULL,
  setSeed = NULL,
  keepOutliers = FALSE
)

Arguments

pat

PurpleAir Timeseries pat object.

sampleSize

Non-negative integer giving the number of rows to choose.

sampleFraction

Fraction of rows to choose.

setSeed

Integer that sets random number generation. Can be used to reproduce sampling.

keepOutliers

logical specifying a graphics focused sampling algorithm (see Details).

Details

When keepOutliers = FALSE, random sampling is used to provide a statistically relevant subsample of the data.

When keepOutliers = TRUE, a customized sampling algorithm is used that attempts to create subsets for use in plotting that create plots that are visually identical to plots using all data. This is accomplished by preserving outliers and only sampling data in regions where overplotting is expected.

The process is as follows:

  1. find outliers using seismicRoll::findOutliers()

  2. create a subset consisting of only outliers

  3. sample the remaining data

  4. merge the outliers and sampled data

Value

A subset of the given pat object.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
library(AirSensor)

example_pat %>%
  pat_extractData() %>%
  dim()

example_pat %>%
  pat_sample(sampleSize = 1000, setSeed = 1) %>%
  pat_extractData() %>%
  dim()

AirSensor documentation built on March 13, 2021, 1:07 a.m.