Description Usage Arguments Details Value See Also Examples
View source: R/calcStatCategorical.R
CalcStatCategorical
inputs a data.table or data.frame having two columns of
observation and model/forecast and computes some of the categorical verification measures
for either categorical or continous variables.
1 2 3 4 |
DT |
A data.table or dataframe: containing two columns of observation (truth) and the model/forecast |
obsCol |
Character: name of the observation column. |
modCol |
Character: name of the model/forecast column. |
obsMissing |
Numeric/Character vector: defining all the missing values in the observation |
modMissing |
Numeric/Character vector: defining all the missing values in the model/forecats |
threshold |
Numeric vector: Define it if you have numeric variables and you want to calculate the categorical statistics for different cutoff/threshod values |
category |
Vector with two elements. At this time only a 2 by 2 contigenc y table is supported. should be defined if the variable is actually categorical and threshold is NULL |
groupBy |
Character vector: Name of all the columns in |
obsCondRange |
Numeric vector: containing two elements (DEFAULT = c(-Inf,Inf)). Values are used as the lower and upper boundary for observation in calculating conditional statistics. If conditioning only at one tail, leave the second value as -Inf or Inf. For eaxmple, if interested on only values greater than 2, then obsCondRange = c(2, Inf) |
modCondRange |
Numeric vector: containing two elements (DEFAULT = c(-Inf,Inf)). Values are used as the lower and upper boundary for model/forecast in calculating conditional statistics. If conditioning only at one tail, leave the second value as -Inf or Inf. For eaxmple, if interested on only values greater than 2, then obsCondRange = c(2, Inf) |
statList |
Character vector: list of all the statistics you are interested. |
The calculated statistics are the following:
a : Hits in contingency table (both observation and forecast say YES)
b : False alarm in contingency table (observation says NO while forecast says YES)
c : Misses in contingency table (observation says YES while forecast says NO)
d : Correct rejection in contingency table (both observation and forecast say NO)
n : Total number of pairs = a+b+c+d
s : Base rate = (a+c)/n
r : Forecast rate = (a+b)/n,
B : Frequency bias = (a+b)/(a+c)
H : Hit rate = a/(a+c),
F : False alarm rate = b/(b+d),
FAR : False alarm ratio = b/(a+b),
PC : Proportion Correct = (a+d)/n,
CSI : Critical Success Index = a/(a+b+c),
GSS : Gilbert Skill Score = (a-ar)/(a+b+c-ar), where ar = (a+b)(a+c) /n is the expected a for a random forecast with the same r and s
HSS : Heidke Skill Score = (a+d-ar-dr)/(n-ar-dr), where dr = (b+d)(c+d)/n
PSS : Pierce Skilll Score = (a*d-b*c)/((b+d)*(a+c)),
CSS : Clayton Skill Scrore = a/(a+b)-c/(c+d),
DSS : Doolittle Skill Score = (a*d-b*c)/sqrt((a+b)*(c+d)*(a+c)*(b+d)),
LOR : Log of Odds Ratio = log(a*d/(b*c)),
ORSS : Odds Ratio Skill Score = (a*d-b*c)/(a*d+b*c),
EDS : Extreme Dependency Score = 2*log((a+c)/n)/log(a/n),
SEDS : Symmetric Extreme Dependency Score = log(ar/a)/log(a/n),
SEDI : Symmetric External Dependence Index= (log(b/(b+d))-log(a/(a+c))+log(1-a/(a+c))-log(1-b/(b+d)))/(log(b/(b+d))+log(a/(a+c))+log(1-a/(a+c)))+log(1-b/(b+d))
For more information refer to Forecast Verification, A Practitioner Guide in Atmospheric Science. Jollife and Stephenson, 2012.
data.frame containing all the requested statistics in statList
Other modelEvaluation: CalcModPerfMulti
,
CalcModPerf
,
CalcNoahmpFluxes
,
CalcNoahmpWatBudg
,
CalcStatCont
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | ## Not run:
# for categorical data
ExampleDF <- data.frame(obs=c(rep("YES",25), rep("NO", 25)), mod=rep(c("YES","NO"),25))
stat <- CalcStatCategorical(DT = ExampleDF, obsCol = "obs",
modCol = "mod", category = c("YES","NO"))
# for categorical data with more than one experiment
ExampleDF <- data.frame(obs=c(rep("YES",25), rep("NO", 25)), mod=rep(c("YES","NO"),25),
Experiment = c(rep(c("1","2","3"),16),"1","2"))
stat <- CalcStatCategorical(DT = ExampleDF, obsCol = "obs", modCol = "mod",
category = c("YES","NO"), groupBy="Experiment")
# for continuous data with different threshold values
ExampleDF <- data.frame(obs=rnorm(10000, 100, 10), mod=rnorm(10000, 100, 10))
stat <- CalcStatCategorical(DT = ExampleDF, obsCol = "obs", modCol = "mod",
threshold = c(60,70,80,90,100,110, 120, 130, 140))
ExampleDF <- data.frame(obs=rnorm(10000, 100, 10), mod=rnorm(10000, 100, 10),
Experiment=rep(c("Model1","Model2"),5000))
stat <- CalcStatCategorical(DT = ExampleDF, obsCol = "obs", modCol = "mod",
threshold = c(60,70,80,90,100,110, 120, 130, 140), groupBy = "Experiment")
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.