auc | R Documentation |
Calculates the area under a curve (integral) following the trapezoid rule. With auc.mc
several Monte Carlo methods can be applied to obtain error terms for estimating the interpolation error for the integration.
auc(x, y, thresh = NULL, dens = 100, sort.x = TRUE) auc.mc(x, y, method = "leave out", lo = 2, it = 100, ...)
x |
Numerical vector giving the x cordinates of the points of the line (curve). |
y |
Numerical vector giving the y cordinates of the points of the line (curve). One can calculate the integral of a fitted line through giving a vector to |
thresh |
Threshold below which area is not calculated. Can be used to deal with unrealistically low flux data. By default |
dens |
By default the data density is artificially increased by adding 100 data points between given adjacent data points. These additional data points are calculated by linear interpolation along x and y. When a threshold is set, this procedure increases the accuracy of the result. Setting |
sort.x |
By default the vectors in |
method |
Specify how interpolation error should be estimated. Available methods include |
lo |
When estimating interpolation error with |
it |
How many iterations should be run when using |
... |
Any arguments passed through to |
During integration the underlying assumption is that values can be interpolated linearly between adjacent data points. In many cases this is questionable. For estimating the linear interpolation error from the data at hand one may use Monte Carlo resampling methods. In auc.mc
the following approaches are available:
leave out
: In each run lo
data points are randomly omitted. This is quite straightforward, but the number of data points left out (lo
) is arbitrary and thus the error terms estimated with this approach may be hardly defensible.
bootstrap
: Data are bootstrapped (sampling with replacement). Thus, some data points may repeat whereas others may be omitted. Due to the random sampling the order of data points is changed which may be unwanted with times series and may produce largely exaggerated error terms. This is only effective if sort.x = FALSE
.
sorted bootstrap
: Same as before but ordering along x
after bootstrapping may cure some problems of changed order. However, due to repeated data points time series spreading seasons but having data showing distinct seasonality may still be misrepresented.
constrained bootstrap
: Same as before but after ordering repeated data points are omitted. Thus, this equals leaving some measurements out at each run with a random number of leave outs. Numbers of leave outs typically show normal distribution around 3/4n.
jackknife
: auc
is calculated for all possible combinations of length(x)-1
data points. Depending on length(x)
the number of combinations can be quite low.
jack-validate
: auc
is calculated for all possible combinations of (length(x)-lo)
: (length(x)-1)
data points. Partly cures the "arbitrarity" problem of the leave out
approach and produces stable summary statistics.
auc
returns a numeric value that expresses the area under the curve. The unit depends from the input.
auc.mc
returns a numeric vector containing the auc
values of the it
permutations. Just calculate summary statistics from this as you like. Due to the sampling approaches means and medians are not stable for most of the methods. jackknife
and jack-validate
produce repeatable results, in the case of leave out
it depends on n (length(x)
) and it
.
Gerald Jurasinski, gerald.jurasinski@uni-rostock.de
trapz
, integrate
## Construct a data set (Imagine 2-hourly ghg emission data ## (methane) measured during a day). ## The emission vector (data in mg CH4 / m2*h) as a time series. ghg <- ts(c(12.3, 14.7, 17.3, 13.2, 8.5, 7.7, 6.4, 3.2, 19.8, 22.3, 24.7, 15.6, 17.4), start=0, end=24, frequency=0.5) ## Have a look at the emission development. plot(ghg) ## Calculate what has been emitted that day ## Assuming that emissions develop linearly between ## measurements auc(time(ghg), ghg) ## Test some of the auc.mc approaches ## "leave out" as default auc.rep <- auc.mc(time(ghg), ghg) ## mean and median are well below the original value summary(auc.rep) ## results for "bootstrap" are unstable (run several times) auc.rep <- auc.mc(time(ghg), ghg, "boot") summary(auc.rep) ## results for "jack-validate" are stable (run several times) auc.rep <- auc.mc(time(ghg), ghg, "jack-val", lo=3) summary(auc.rep) ## The effect of below.zero: ## Shift data, so that we have negative emissions (immissions) ghg <- ghg-10 ## See the difference plot(ghg) abline(h=0) ## With thresh = NULL the negative emissions are subtracted ## from the positive emissions auc(time(ghg), ghg) ## With thresh = -0.5 the negative emissions are set to -0.5 ## and only the emissions >= -0.5 count. auc(time(ghg), ghg, thresh = -0.5)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.