cmv: Counting Missing Values
In hydroTSM: Time Series Management and Analysis for Hydrological Modelling

View source: R/cmv.R

cmv	R Documentation

Counting Missing Values

Description

Generic function for counting the percentage/amount of missing values in a zoo object, using a user-defined temporal scale.

Usage

cmv(x, ...)

## Default S3 method:
cmv(x, tscale=c("hourly", "daily", "weekly", "monthly", 
            "quarterly", "seasonal", "annual"),
            out.type=c("percentage", "amount"), dec=3, 
            start="00:00:00", start.fmt= "%H:%M:%S", tz, ...)

## S3 method for class 'zoo'
cmv(x, tscale=c("hourly", "daily", "weekly", "monthly", 
            "quarterly", "seasonal", "annual"),
            out.type=c("percentage", "amount"), dec=3, 
            start="00:00:00", start.fmt= "%H:%M:%S", tz, ...)

## S3 method for class 'data.frame'
cmv(x, tscale=c("hourly", "daily", "weekly", "monthly", 
            "quarterly", "seasonal", "annual"),
            out.type=c("percentage", "amount"), dec=3, 
            start="00:00:00", start.fmt= "%H:%M:%S", tz, 
            dates=1, date.fmt="%Y-%m-%d", ...)

## S3 method for class 'matrix'
cmv(x, tscale=c("hourly", "daily", "weekly", "monthly", 
            "quarterly", "seasonal", "annual"),
            out.type=c("percentage", "amount"), dec=3, 
            start="00:00:00", start.fmt= "%H:%M:%S", tz,
            dates=1, date.fmt="%Y-%m-%d", ...)

Arguments

`x`	zoo, data.frame or matrix object, with the time series to be analised. Measurements at several gauging stations can be stored in a data.frame or matrix object, and in that case, each column of `x` represents the time series measured in a gauging statio, and the column names of `x` have to correspond to the ID of each station (starting by a letter).
`tscale`	character with the temporal scale to be used for analysing the mssing data. Valid values are: -) `hourly`: the percentage/amount of missing values will be given for each hour and ,therefore, the expected time frequency of `x` must be sub-hourly. -) `daily`: the percentage/amount of missing values will be given for each day and, therefore, the expected time frequency of `x` must be sub-daily (i.e., hourly or sub-hourly). -) `weekly`: the percentage/amount of missing values will be given for each week (starting on Monday) and, therefore, the expected time frequency of `x` must be sub-weekly (i.e., daily, (sub)hourly). -) `monthly`: the percentage/amount of missing values will be given for each month and, therefore, the expected time frequency of `x` must be sub-monthly (i.e., daily, hourly or sub-hourly). -) `quarterly`: the percentage/amount of missing values will be given for each quarter and, therefore, the expected time frequency of `x` must be sub-quarterly (i.e., monthly, daily, hourly or sub-hourly). -) `seasonal`: the percentage/amount of missing values will be given for each weather season (see ?time2season) and, therefore, the expected time frequency of `x` must be sub-seasonal (i.e., monthly, daily, hourly or sub-hourly). -) `annual`: the percentage/amount of missing values will be given for each year and, therefore, the expected time frequency of `x` must be sub-annual (i.e., seasonal, monthly, daily, hourly or sub-hourly).
`dec`	integer indicating the amount of decimal places included in the output. It is only used when `out.type=='percentage'`.
`start`	character, indicating the starting time used for aggregating sub-daily time series into daily ones. It MUST be provided in the format specified by `start.fmt`. This value is used to define the time when a new day begins (e.g., for some rain gauge stations). -) All the values of `x` with a time attribute before `start` are considered as belonging to the day before the one indicated in the time attribute of those values. -) All the values of `x` with a time attribute equal to `start` are considered to be equal to `"00:00:00"` in the output zoo object. -) All the values of `x` with a time attribute after `start` are considered as belonging to the same day as the one indicated in the time attribute of those values. It is useful when the daily values start at a time different from `"00:00:00"`. Use with caution. See examples.
`start.fmt`	character indicating the format in which the time is provided in `start`, By default `date.fmt=%H:%M:%S`. See `format` in `as.POSIXct`.
`tz`	character, with the specification of the time zone used in both `x` and `start`. System-specific (see time zones), but `""` is the current time zone, and `"GMT"` is UTC (Universal Time, Coordinated). See `Sys.timezone` and `as.POSIXct`. If `tz` is missing (the default), it is automatically set to the time zone used in `time(x)`. This argument can be used to force using the local time zone or any other time zone instead of UTC as time zone.
`dates`	numeric, factor, POSIXct or POSIXt object indicating how to obtain the dates and times for each column of `x` (e.g., gauging station). If `dates` is a number, it indicates the index of the column in `x` that stores the date and times. If `dates` is a factor, it is converted into POSIXct class, using the date format specified by `date.fmt` If `dates` is already of POSIXct or POSIXt class, this function verifies that the number of elements on it be equal to the number of elements in `x`.
`date.fmt`	character indicating the format in which the dates are stored in `dates`, By default `date.fmt=%Y-%m-%d %H:%M:%S`. See `format` in `as.Date`. ONLY required when `class(dates)=="factor"` or `class(dates)=="numeric"`.
`out.type`	character indicating how should be returned the missing values for each temporal scale. Valid values are: -) `percentage`: the missing values are returned as an real value, representing the percentage of missing values in each temporal scale. -) `amount`: the missing values are returned as an integer value, representing the absolute amount of missing values in each temporal scale.
`...`	further arguments passed to or from other methods.

Details

The amount of missing values in each temporal scale is computed just by counting the amount of NAs in each hour / day / week / month / quarter / season / year, while the percentage of missing values in each temporal scale is computed by dividing the previous number by the total number of data elements in each hour / day / week / month / quarter / season / year.

This function was developed to allow the selective removal of values when agregting from a high temporal resolution into a lower temporal resolution (e.g., from hourly to daily or from daily to monthly), using any of the temporal aggregation functions available int his package (e.g., hourly2daily, daily2monthly)

Value

a zoo object with the percentage/amount of missing values for each temporal scale selected by the user.

Author(s)

Mauricio Zambrano-Bigiarini, mzb.devel@gmail

Examples

######################
## Ex1: Loading the DAILY precipitation data at SanMartino (25567 daily values)
data(SanMartinoPPts)
x <- SanMartinoPPts

## Transforming into NA the 10% of values in 'x'
n           <- length(x)
n.nas       <- round(0.1*n, 0)
na.index    <- sample(1:n, n.nas)
x[na.index] <- NA

# Getting the amount of NAs in 'x' for each week (starting on Monday)
cmv(x, tscale="weekly")

# Getting the amount of NAs in 'x' for each month
cmv(x, tscale="monthly")

# Getting the amount of NAs in 'x' for each quarter
cmv(x, tscale="quarterly")

# Getting the amount of NAs in 'x' for each weather season
cmv(x, tscale="seasonal")

# Getting the amount of NAs in 'x' for each year
cmv(x, tscale="annual")
######################
## Ex2: Loading the time series of HOURLY streamflows for the station 
## Karamea at Gorge (52579 hourly values)
data(KarameaAtGorgeQts)
x <- KarameaAtGorgeQts

## Transforming into NA the 30% of values in 'x'
n           <- length(x)
n.nas       <- round(0.1*n, 0)
na.index    <- sample(1:n, n.nas)
x[na.index] <- NA

# Getting the amount of NAs in 'x' for each day
cmv(x, tscale="daily")

# Getting the amount of NAs in 'x' for each weather season
cmv(x, tscale="seasonal")

hydroTSM documentation built on Nov. 4, 2024, 5:07 p.m.