outliers_ghcnd_training_set: Sample of stations for training models.

Description Usage Format Details Source

Description

This data set contains a random sample of stations across the continuous United States. Each station samples had at least 20 observations. All stations which contained an outliers as found in outliers_ghcnd had their outliers included in the data set. Then a random sample of other stations was taken, and a random sample of 20 non-outliers was taken from each of those stations. There are 327820 non-outliers to 36132 outliers.

Usage

1

Format

A data frame with 363952 rows and 30 variables:

ID

station ID

DATE

date of the observation

TYPE

from outliers_ghcnd

SNWD

Snow depth in mm

WESD

water equivalent of snow depth in 10ths mm

OUTLIER_FINAL

0 if non-outlier 1 if outlier

MONTH

the month of the DATE

STATION_AVG

as calculated from create_model_variables

STATION_SD

as calculated from create_model_variables

STATION_MAX

as calculated from create_model_variables

STATION_MIN

as calculated from create_model_variables

STATION_RANGE

as calculated from create_model_variables

YEAR_AVG

as calculated from create_model_variables

YEAR_SD

as calculated from create_model_variables

YEAR_MAX

as calculated from create_model_variables

YEAR_MIN

as calculated from create_model_variables

YEAR_RANGE

as calculated from create_model_variables

MONTH_AVG

as calculated from create_model_variables

MONTH_SD

as calculated from create_model_variables

MONTH_MAX

as calculated from create_model_variables

MONTH_MIN

as calculated from create_model_variables

MONTH_RANGE

as calculated from create_model_variables

MONTH_DENSITY

as calculated from create_model_variables

WEEK_AVG

as calculated from create_model_variables

WEEK_MAX

as calculated from create_model_variables

WEEK_MIN

as calculated from create_model_variables

WEEK_RANGE

as calculated from create_model_variables

PRISM_PPT

as calculated from create_model_variables

PRISM_TMIN

as calculated from create_model_variables

PRISM_TMAX

as calculated from create_model_variables

Details

This data set can be used to train various models like a random forest.

Source

https://www.ncdc.noaa.gov/ghcn-daily-description


scoutiii/HTSoutliers documentation built on April 4, 2021, 4:47 p.m.