pat_sample: Sample PurpleAir time series data

View source: R/pat_sample.R

pat_sampleR Documentation

Sample PurpleAir time series data

Description

A sampling function that accepts PurpleAir timeseries dataframes and reduces them by randomly selecting distinct rows of the users chosen size.

If both sampleSize and sampleFraction are unspecified, sampleSize = 5000 will be used.

Usage

pat_sample(
  pat = NULL,
  sampleSize = NULL,
  sampleFraction = NULL,
  setSeed = NULL,
  keepOutliers = FALSE
)

Arguments

pat

PurpleAir Timeseries pat object.

sampleSize

Non-negative integer giving the number of rows to choose.

sampleFraction

Fraction of rows to choose.

setSeed

Integer that sets random number generation. Can be used to reproduce sampling.

keepOutliers

logical specifying a graphics focused sampling algorithm (see Details).

Details

When keepOutliers = FALSE, random sampling is used to provide a statistically relevant subsample of the data.

When keepOutliers = TRUE, a customized sampling algorithm is used that attempts to create subsets for use in plotting that create plots that are visually identical to plots using all data. This is accomplished by preserving outliers and only sampling data in regions where overplotting is expected.

The process is as follows:

  1. find outliers using seismicRoll::findOutliers()

  2. create a subset consisting of only outliers

  3. sample the remaining data

  4. merge the outliers and sampled data

Value

A subset of the given pat object.

Examples

library(AirSensor)

example_pat %>%
  pat_extractData() %>%
  dim()

example_pat %>%
  pat_sample(sampleSize = 1000, setSeed = 1) %>%
  pat_extractData() %>%
  dim()


MazamaScience/AirSensor documentation built on April 28, 2023, 11:16 a.m.