# CalcStatCategorical: Calculate some categorical verification measures for... In mccreigh/rwrfhydro: R tools for the WRF Hydro Model

## Description

`CalcStatCategorical` inputs a data.table or data.frame having two columns of observation and model/forecast and computes some of the categorical verification measures for either categorical or continous variables.

## Usage

 ```1 2 3 4``` ```CalcStatCategorical(DT, obsCol, modCol, obsMissing = NULL, modMissing = NULL, threshold = NULL, category = c("YES", "NO"), groupBy = NULL, obsCondRange = c(-Inf, Inf), modCondRange = c(-Inf, Inf), statList = c("H", "FAR", "CSI")) ```

## Arguments

 `DT` A data.table or dataframe: containing two columns of observation (truth) and the model/forecast `obsCol` Character: name of the observation column. `modCol` Character: name of the model/forecast column. `obsMissing` Numeric/Character vector: defining all the missing values in the observation `modMissing` Numeric/Character vector: defining all the missing values in the model/forecats `threshold` Numeric vector: Define it if you have numeric variables and you want to calculate the categorical statistics for different cutoff/threshod values `category` Vector with two elements. At this time only a 2 by 2 contigenc y table is supported. should be defined if the variable is actually categorical and threshold is NULL `groupBy` Character vector: Name of all the columns in `DT` which the statistics should be classified based on. `obsCondRange` Numeric vector: containing two elements (DEFAULT = c(-Inf,Inf)). Values are used as the lower and upper boundary for observation in calculating conditional statistics. If conditioning only at one tail, leave the second value as -Inf or Inf. For eaxmple, if interested on only values greater than 2, then obsCondRange = c(2, Inf) `modCondRange` Numeric vector: containing two elements (DEFAULT = c(-Inf,Inf)). Values are used as the lower and upper boundary for model/forecast in calculating conditional statistics. If conditioning only at one tail, leave the second value as -Inf or Inf. For eaxmple, if interested on only values greater than 2, then obsCondRange = c(2, Inf) `statList` Character vector: list of all the statistics you are interested.

## Details

The calculated statistics are the following:

• a : Hits in contingency table (both observation and forecast say YES)

• b : False alarm in contingency table (observation says NO while forecast says YES)

• c : Misses in contingency table (observation says YES while forecast says NO)

• d : Correct rejection in contingency table (both observation and forecast say NO)

• n : Total number of pairs = a+b+c+d

• s : Base rate = (a+c)/n

• r : Forecast rate = (a+b)/n,

• B : Frequency bias = (a+b)/(a+c)

• H : Hit rate = a/(a+c),

• F : False alarm rate = b/(b+d),

• FAR : False alarm ratio = b/(a+b),

• PC : Proportion Correct = (a+d)/n,

• CSI : Critical Success Index = a/(a+b+c),

• GSS : Gilbert Skill Score = (a-ar)/(a+b+c-ar), where ar = (a+b)(a+c) /n is the expected a for a random forecast with the same r and s

• HSS : Heidke Skill Score = (a+d-ar-dr)/(n-ar-dr), where dr = (b+d)(c+d)/n

• PSS : Pierce Skilll Score = (a*d-b*c)/((b+d)*(a+c)),

• CSS : Clayton Skill Scrore = a/(a+b)-c/(c+d),

• DSS : Doolittle Skill Score = (a*d-b*c)/sqrt((a+b)*(c+d)*(a+c)*(b+d)),

• LOR : Log of Odds Ratio = log(a*d/(b*c)),

• ORSS : Odds Ratio Skill Score = (a*d-b*c)/(a*d+b*c),

• EDS : Extreme Dependency Score = 2*log((a+c)/n)/log(a/n),

• SEDS : Symmetric Extreme Dependency Score = log(ar/a)/log(a/n),

• SEDI : Symmetric External Dependence Index= (log(b/(b+d))-log(a/(a+c))+log(1-a/(a+c))-log(1-b/(b+d)))/(log(b/(b+d))+log(a/(a+c))+log(1-a/(a+c)))+log(1-b/(b+d))

For more information refer to Forecast Verification, A Practitioner Guide in Atmospheric Science. Jollife and Stephenson, 2012.

## Value

data.frame containing all the requested statistics in `statList`

## See Also

Other modelEvaluation: `CalcModPerfMulti`, `CalcModPerf`, `CalcNoahmpFluxes`, `CalcNoahmpWatBudg`, `CalcStatCont`

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24``` ```## Not run: # for categorical data ExampleDF <- data.frame(obs=c(rep("YES",25), rep("NO", 25)), mod=rep(c("YES","NO"),25)) stat <- CalcStatCategorical(DT = ExampleDF, obsCol = "obs", modCol = "mod", category = c("YES","NO")) # for categorical data with more than one experiment ExampleDF <- data.frame(obs=c(rep("YES",25), rep("NO", 25)), mod=rep(c("YES","NO"),25), Experiment = c(rep(c("1","2","3"),16),"1","2")) stat <- CalcStatCategorical(DT = ExampleDF, obsCol = "obs", modCol = "mod", category = c("YES","NO"), groupBy="Experiment") # for continuous data with different threshold values ExampleDF <- data.frame(obs=rnorm(10000, 100, 10), mod=rnorm(10000, 100, 10)) stat <- CalcStatCategorical(DT = ExampleDF, obsCol = "obs", modCol = "mod", threshold = c(60,70,80,90,100,110, 120, 130, 140)) ExampleDF <- data.frame(obs=rnorm(10000, 100, 10), mod=rnorm(10000, 100, 10), Experiment=rep(c("Model1","Model2"),5000)) stat <- CalcStatCategorical(DT = ExampleDF, obsCol = "obs", modCol = "mod", threshold = c(60,70,80,90,100,110, 120, 130, 140), groupBy = "Experiment") ## End(Not run) ```

mccreigh/rwrfhydro documentation built on May 12, 2018, 3:08 a.m.