CalcStatCategorical: Calculate some categorical verification measures for...
In mccreigh/rwrfhydro: R tools for the WRF Hydro Model

Description Usage Arguments Details Value See Also Examples

CalcStatCategorical inputs a data.table or data.frame having two columns of observation and model/forecast and computes some of the categorical verification measures for either categorical or continous variables.

CalcStatCategorical(DT, obsCol, modCol, obsMissing = NULL,
  modMissing = NULL, threshold = NULL, category = c("YES", "NO"),
  groupBy = NULL, obsCondRange = c(-Inf, Inf), modCondRange = c(-Inf,
  Inf), statList = c("H", "FAR", "CSI"))

`DT`	A data.table or dataframe: containing two columns of observation (truth) and the model/forecast
`obsCol`	Character: name of the observation column.
`modCol`	Character: name of the model/forecast column.
`obsMissing`	Numeric/Character vector: defining all the missing values in the observation
`modMissing`	Numeric/Character vector: defining all the missing values in the model/forecats
`threshold`	Numeric vector: Define it if you have numeric variables and you want to calculate the categorical statistics for different cutoff/threshod values
`category`	Vector with two elements. At this time only a 2 by 2 contigenc y table is supported. should be defined if the variable is actually categorical and threshold is NULL
`groupBy`	Character vector: Name of all the columns in `DT` which the statistics should be classified based on.
`obsCondRange`	Numeric vector: containing two elements (DEFAULT = c(-Inf,Inf)). Values are used as the lower and upper boundary for observation in calculating conditional statistics. If conditioning only at one tail, leave the second value as -Inf or Inf. For eaxmple, if interested on only values greater than 2, then obsCondRange = c(2, Inf)
`modCondRange`	Numeric vector: containing two elements (DEFAULT = c(-Inf,Inf)). Values are used as the lower and upper boundary for model/forecast in calculating conditional statistics. If conditioning only at one tail, leave the second value as -Inf or Inf. For eaxmple, if interested on only values greater than 2, then obsCondRange = c(2, Inf)
`statList`	Character vector: list of all the statistics you are interested.

The calculated statistics are the following:

a : Hits in contingency table (both observation and forecast say YES)
b : False alarm in contingency table (observation says NO while forecast says YES)
c : Misses in contingency table (observation says YES while forecast says NO)
d : Correct rejection in contingency table (both observation and forecast say NO)
n : Total number of pairs = a+b+c+d
s : Base rate = (a+c)/n
r : Forecast rate = (a+b)/n,
B : Frequency bias = (a+b)/(a+c)
H : Hit rate = a/(a+c),
F : False alarm rate = b/(b+d),
FAR : False alarm ratio = b/(a+b),
PC : Proportion Correct = (a+d)/n,
CSI : Critical Success Index = a/(a+b+c),
GSS : Gilbert Skill Score = (a-ar)/(a+b+c-ar), where ar = (a+b)(a+c) /n is the expected a for a random forecast with the same r and s
HSS : Heidke Skill Score = (a+d-ar-dr)/(n-ar-dr), where dr = (b+d)(c+d)/n
PSS : Pierce Skilll Score = (a*d-b*c)/((b+d)*(a+c)),
CSS : Clayton Skill Scrore = a/(a+b)-c/(c+d),
DSS : Doolittle Skill Score = (a*d-b*c)/sqrt((a+b)*(c+d)*(a+c)*(b+d)),
LOR : Log of Odds Ratio = log(a*d/(b*c)),
ORSS : Odds Ratio Skill Score = (a*d-b*c)/(a*d+b*c),
EDS : Extreme Dependency Score = 2*log((a+c)/n)/log(a/n),
SEDS : Symmetric Extreme Dependency Score = log(ar/a)/log(a/n),
SEDI : Symmetric External Dependence Index= (log(b/(b+d))-log(a/(a+c))+log(1-a/(a+c))-log(1-b/(b+d)))/(log(b/(b+d))+log(a/(a+c))+log(1-a/(a+c)))+log(1-b/(b+d))

For more information refer to Forecast Verification, A Practitioner Guide in Atmospheric Science. Jollife and Stephenson, 2012.

data.frame containing all the requested statistics in statList

Other modelEvaluation: CalcModPerfMulti, CalcModPerf, CalcNoahmpFluxes, CalcNoahmpWatBudg, CalcStatCont

## Not run: 

# for categorical data
ExampleDF <- data.frame(obs=c(rep("YES",25), rep("NO", 25)), mod=rep(c("YES","NO"),25))
stat <- CalcStatCategorical(DT = ExampleDF, obsCol = "obs", 
modCol = "mod", category = c("YES","NO"))

# for categorical data with more than one experiment
ExampleDF <- data.frame(obs=c(rep("YES",25), rep("NO", 25)), mod=rep(c("YES","NO"),25), 
Experiment = c(rep(c("1","2","3"),16),"1","2"))
stat <- CalcStatCategorical(DT = ExampleDF, obsCol = "obs", modCol = "mod", 
category = c("YES","NO"), groupBy="Experiment")

# for continuous data with different threshold values
ExampleDF <- data.frame(obs=rnorm(10000, 100, 10), mod=rnorm(10000, 100, 10))
stat <- CalcStatCategorical(DT = ExampleDF, obsCol = "obs", modCol = "mod", 
threshold = c(60,70,80,90,100,110, 120, 130, 140))

ExampleDF <- data.frame(obs=rnorm(10000, 100, 10), mod=rnorm(10000, 100, 10), 
Experiment=rep(c("Model1","Model2"),5000))
stat <- CalcStatCategorical(DT = ExampleDF, obsCol = "obs", modCol = "mod", 
threshold = c(60,70,80,90,100,110, 120, 130, 140), groupBy = "Experiment")

## End(Not run)