loess_model: Fits loess curve to each station in a data set

Description Usage Arguments Details Value Examples

View source: R/loess_model.R

Description

Takes a data set with ID, DATE and SNWD (use create_model_variables), and adds a column with a loess curve upper estimate, and a column with 1 for observations which are more than 3.5 SD above the loess estimate.

Usage

1
2
3
4
5
6
7
8
loess_model(
  data,
  span = 0.1,
  nsigma = 3.5,
  family = "symmetric",
  progress = TRUE,
  ...
)

Arguments

data

A data set with at least ID, DATE, and SNWD (use create_model_variables).

span

Default is .1, passed to msir::loess_sd.

nsigma

Default is 3.5, is the number of standard deviations to use when finding the upper bound from the loess curve.

family

Default symmetric, passed to msir::loess_sd.

progress

True if you want a progress bar.

...

extra parameters are passed to msir::loess_sd.

Details

The data set can have an arbitrary number of stations, and a loess curve will be fit for each station. For each station, each observation's DATE is converted to a day of the year. So '1999-1-1' will be DOY=0, '2004-1-1' will also be DOY=0. This effectively collapses each stations data into one year where each day will have every years observation. msir::loess is then fit with the given span, nsigma, and family parameters. Some stations don't have enough data, or loess simply won't converge, in these cases the default value returned for loess_outlier is 1, and for loess_upper NA is returned.

Value

A copy of the data set with loess_upper (for plotting) and loess_outlier (1 for outlier, 0 for not) added.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
## Not run: 
# Some test station id's

test_ids <- c("US1COBO0074", "US1COBO0024",
              "US1COAR0114", "US1COLR0758",
              "USC00050437", "USC00053038",
              "US1COAD0098", "USW00023066",
              "USC00051179", "US1COLR0187",
              "US1CODG0079", "US1CODG0033",
              "US1COBO0266", "USC00052501",
              "US1COPU0080", "USC00056410",
              "US1COLR0520", "US1COLR0157",
              "US1COLR0142", "USC00056012")

# creates a testing data set of colorado data
test_data <- get_weather_data(test_ids)
test_data <- create_flagged_dataset(test_data)
test_data <- create_model_variables(test_data)
test_data$ID <- as.character(test_data$ID)

# fits loess models on all the testing data from colorado
test_data <- loess_model(test_data)

# creates a confusion matrix to assess performance
caret::confusionMatrix(test_data$loess_outlier,
                       test_data$OUTLIER_FINAL,
                       positive="1")


## End(Not run)

scoutiii/HTSoutliers documentation built on April 4, 2021, 4:47 p.m.