clean_data: Clean and Optionally Aggregate Environmental Data

View source: R/data_cleaning.R

clean_dataR Documentation

Clean and Optionally Aggregate Environmental Data

Description

Cleans a data table of environmental measurements by filtering for a specific station, removing duplicates, and optionally aggregating the data on a daily basis using the mean.

Usage

clean_data(env_data, station, aggregate_daily = FALSE)

Arguments

env_data

A data table in long format. Must include columns:

Station

Station identifier for the data.

Komponente

Measured environmental component e.g. temperature, NO2.

Wert

Measured value.

date

Timestamp as Date-Time object (⁠YYYY-MM-DD HH:MM:SS⁠ format).

Komponente_txt

Textual description of the component.

station

Character. Name of the station to filter by.

aggregate_daily

Logical. If TRUE, aggregates data to daily mean values. Default is FALSE.

Details

Duplicate rows (by date, Komponente, and Station) are removed. A warning is issued if duplicates are found.

Value

A data.table:

  • If aggregate_daily = TRUE: Contains columns for station, component, day, year, and the daily mean value of the measurements.

  • If aggregate_daily = FALSE: Contains cleaned data with duplicates removed.

Examples

# Example data
env_data <- data.table::data.table(
  Station = c("DENW094", "DENW094", "DENW006", "DENW094"),
  Komponente = c("NO2", "O3", "NO2", "NO2"),
  Wert = c(45, 30, 50, 40),
  date = as.POSIXct(c(
    "2023-01-01 08:00:00", "2023-01-01 09:00:00",
    "2023-01-01 08:00:00", "2023-01-02 08:00:00"
  )),
  Komponente_txt = c(
    "Nitrogen Dioxide", "Ozone", "Nitrogen Dioxide", "Nitrogen Dioxide"
  )
)

# Clean data for StationA without aggregation
cleaned_data <- clean_data(env_data, station = "DENW094", aggregate_daily = FALSE)
print(cleaned_data)

ubair documentation built on April 12, 2025, 2:12 a.m.