rPearson: Pearson correlation coefficient
In hydroGOF: Goodness-of-Fit Functions for Comparison of Simulated and Observed Hydrological Time Series

rPearson

R Documentation

Pearson correlation coefficient

Description

Pearson correlation coefficient between sim and obs, with treatment of missing values.

Usage

rPearson(sim, obs, ...)

## Default S3 method:
rPearson(sim, obs, fun=NULL, ..., 
             epsilon.type=c("none", "Pushpalatha2012", "otherFactor", "otherValue"), 
             epsilon.value=NA)

## S3 method for class 'matrix'
rPearson(sim, obs, na.rm=TRUE, fun=NULL, ..., 
             epsilon.type=c("none", "Pushpalatha2012", "otherFactor", "otherValue"), 
             epsilon.value=NA)

## S3 method for class 'data.frame'
rPearson(sim, obs, na.rm=TRUE, fun=NULL, ..., 
             epsilon.type=c("none", "Pushpalatha2012", "otherFactor", "otherValue"), 
             epsilon.value=NA)

## S3 method for class 'zoo'
rPearson(sim, obs, na.rm=TRUE, fun=NULL, ..., 
             epsilon.type=c("none", "Pushpalatha2012", "otherFactor", "otherValue"), 
             epsilon.value=NA)

Arguments

`sim`	numeric, zoo, matrix or data.frame with simulated values
`obs`	numeric, zoo, matrix or data.frame with observed values
`na.rm`	a logical value indicating whether 'NA' should be stripped before the computation proceeds. When an 'NA' value is found at the i-th position in `obs` OR `sim`, the i-th value of `obs` AND `sim` are removed before the computation.
`fun`	function to be applied to `sim` and `obs` in order to obtain transformed values thereof before computing this goodness-of-fit index. The first argument MUST BE a numeric vector with any name (e.g., `x`), and additional arguments are passed using `...`.
`...`	arguments passed to `fun`, in addition to the mandatory first numeric vector.
`epsilon.type`	argument used to define a numeric value to be added to both `sim` and `obs` before applying `fun`. It is was designed to allow the use of logarithm and other similar functions that do not work with zero values. Valid values of `epsilon.type` are: 1) `"none"`: `sim` and `obs` are used by `fun` without the addition of any numeric value. This is the default option. 2) `"Pushpalatha2012"`: one hundredth (1/100) of the mean observed values is added to both `sim` and `obs` before applying `fun`, as described in Pushpalatha et al. (2012). 3) `"otherFactor"`: the numeric value defined in the `epsilon.value` argument is used to multiply the the mean observed values, instead of the one hundredth (1/100) described in Pushpalatha et al. (2012). The resulting value is then added to both `sim` and `obs`, before applying `fun`. 4) `"otherValue"`: the numeric value defined in the `epsilon.value` argument is directly added to both `sim` and `obs`, before applying `fun`.
`epsilon.value`	numeric value to be added to both `sim` and `obs` when `epsilon.type="otherValue"`.

Details

It is a wrapper to the cor function.

The Pearson correlation coefficient (PCC) is a correlation coefficient that measures linear correlation between two sets of data.

It is the ratio between the covariance of two variables and the product of their standard deviations; thus, it is essentially a normalized measurement of the covariance, such that the result always has a value between -1 and 1. As with covariance itself, the measure can only reflect a linear correlation of variables, and ignores many other types of relationships or correlations.

The correlation coefficient ranges from -1 to 1. An absolute value of exactly 1 implies that a linear equation describes the relationship between sim and obs perfectly, with all data points lying on a line. The correlation sign is determined by the regression slope: a value of +1 implies that all data points lie on a line for which sim increases as obs increases, and vice versa for -1. A value of 0 implies that there is no linear dependency between the variables.

Value

Pearson correlation coefficient between sim and obs.

If sim and obs are matrixes, the returned value is a vector, with the Pearson correlation coefficient between each column of sim and obs.

Note

obs and sim has to have the same length/dimension

The missing values in obs and sim are removed before the computation proceeds, and only those positions with non-missing values in obs and sim are considered in the computation

Author(s)

Mauricio Zambrano Bigiarini <mzb.devel@gmail.com>

References

https://en.wikipedia.org/wiki/Pearson_correlation_coefficient

Pearson, K. (1920). Notes on the history of correlation. Biometrika, 13(1), 25-45. doi:10.2307/2331722.

Schober, P.; Boer, C.; Schwarte, L.A. (2018). Correlation coefficients: appropriate use and interpretation. Anesthesia and Analgesia, 126(5), 1763-1768. doi:10.1213/ANE.0000000000002864.

Examples

##################
# Example 1: basic ideal case
obs <- 1:10
sim <- 1:10
rPearson(sim, obs)

obs <- 1:10
sim <- 2:11
rPearson(sim, obs)

##################
# Example 2: 
# Loading daily streamflows of the Ega River (Spain), from 1961 to 1970
data(EgaEnEstellaQts)
obs <- EgaEnEstellaQts

# Generating a simulated daily time series, initially equal to the observed series
sim <- obs 

# Computing the 'rPearson' for the "best" (unattainable) case
rPearson(sim=sim, obs=obs)

##################
# Example 3: rPearson for simulated values equal to observations plus random noise 
#            on the first half of the observed values. 
#            This random noise has more relative importance for ow flows than 
#            for medium and high flows.
  
# Randomly changing the first 1826 elements of 'sim', by using a normal distribution 
# with mean 10 and standard deviation equal to 1 (default of 'rnorm').
sim[1:1826] <- obs[1:1826] + rnorm(1826, mean=10)
ggof(sim, obs)

rPearson(sim=sim, obs=obs)

##################
# Example 4: rPearson for simulated values equal to observations plus random noise 
#            on the first half of the observed values and applying (natural) 
#            logarithm to 'sim' and 'obs' during computations.

rPearson(sim=sim, obs=obs, fun=log)

# Verifying the previous value:
lsim <- log(sim)
lobs <- log(obs)
rPearson(sim=lsim, obs=lobs)

##################
# Example 5: rPearson for simulated values equal to observations plus random noise 
#            on the first half of the observed values and applying (natural) 
#            logarithm to 'sim' and 'obs' and adding the Pushpalatha2012 constant
#            during computations

rPearson(sim=sim, obs=obs, fun=log, epsilon.type="Pushpalatha2012")

# Verifying the previous value, with the epsilon value following Pushpalatha2012
eps  <- mean(obs, na.rm=TRUE)/100
lsim <- log(sim+eps)
lobs <- log(obs+eps)
rPearson(sim=lsim, obs=lobs)

##################
# Example 6: rPearson for simulated values equal to observations plus random noise 
#            on the first half of the observed values and applying (natural) 
#            logarithm to 'sim' and 'obs' and adding a user-defined constant
#            during computations

eps <- 0.01
rPearson(sim=sim, obs=obs, fun=log, epsilon.type="otherValue", epsilon.value=eps)

# Verifying the previous value:
lsim <- log(sim+eps)
lobs <- log(obs+eps)
rPearson(sim=lsim, obs=lobs)

##################
# Example 7: rPearson for simulated values equal to observations plus random noise 
#            on the first half of the observed values and applying (natural) 
#            logarithm to 'sim' and 'obs' and using a user-defined factor
#            to multiply the mean of the observed values to obtain the constant
#            to be added to 'sim' and 'obs' during computations

fact <- 1/50
rPearson(sim=sim, obs=obs, fun=log, epsilon.type="otherFactor", epsilon.value=fact)

# Verifying the previous value:
eps  <- fact*mean(obs, na.rm=TRUE)
lsim <- log(sim+eps)
lobs <- log(obs+eps)
rPearson(sim=lsim, obs=lobs)

##################
# Example 8: rPearson for simulated values equal to observations plus random noise 
#            on the first half of the observed values and applying a 
#            user-defined function to 'sim' and 'obs' during computations

fun1 <- function(x) {sqrt(x+1)}

rPearson(sim=sim, obs=obs, fun=fun1)

# Verifying the previous value, with the epsilon value following Pushpalatha2012
sim1 <- sqrt(sim+1)
obs1 <- sqrt(obs+1)
rPearson(sim=sim1, obs=obs1)

hydroGOF documentation built on Nov. 4, 2024, 5:08 p.m.