MultiFit: Multiple fits In ModelDataComp: Model-Data Comparison

Description

Fits bivariate or multivariate regressions between a response variable and one or several predictor variables based on multiple fitting methods.

Usage

 ```1 2 3``` ```MultiFit(x, y, fits = c("lm", "quantreg", "poly2", "poly3", "spline", "gam"), xout = NULL, excl.quantile = c(0, 1), fit.quantile = NULL, ...) ```

Arguments

 `x` predictor variables: a vector for bivariate fits, or a matrix or data.frame for multivariate fits `y` vector of a response variable `fits` One or several fitting methods that should be used, possible options are: lm, quantreg, poly2, poly3, spline, gam, rf, logistic `xout` vector or data.frame of predictor variables for which fits should be returned. If NULL, fits are returned along a sequence of x values. This allows the plotting of 2D surfaces in case of two predictor variables (see examples). In case of xout=x, fits are returned for the same x values that were used for fitting. `excl.quantile` lower and upper quantiles for which x and y values should be excluded to compute fits. For example, if excl.quantile=c(0, 0.9) all x and y values above the quantile 0.9 will be excluded from fitting. `fit.quantile` Perform a fitting to a certain quantile of x? Setting this argument to an value between 0 and 1 allows quantile regression. Therfore `SelectQuantiles` is first used to select along a range of x only the values that are around the specified quantile. `...` further arguments (not used)

Details

The following fitting methods are implemented:

• "lm": (multiple) linear regression based on `lm`: lm(y ~ x)

• "quantreg": quantile regression to the median based on `rq`: rq(y ~ x, tau=0.5)

• "poly2": 2nd-order polynomial regression based on `lm`: lm(y ~ poly(x, degree=2))

• "poly3": 3rd-order polynomial regression based on `lm`: lm(y ~ poly(x, degre=3))

• "spline": smoothing spline based on `smooth.spline`: smooth.spline(x, y). This method only works for bivariate fits.

• "gam": generalized additive models using spline smoothing based on `gam`: gam(y ~ s(x))

• "rf": random forest based on `randomForest`: randomForest(y ~ x). This method is not computed by default because it can be computationally expensive.

• "logistic": multiplicative logistic functions based on `FitLogistic`: FitLogistic(x, y). This method is not computed by default because it can be computationally expensive.

Furthermore, ensemble statistics like the mean, median, standard deviation and percentiles are computed from the results of the choosen fitting methods.

Author(s)

Matthias Forkel <[email protected]> [aut, cre]

References

No reference.

`FitLogistic`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54``` ```# bivariate example x <- runif(1000, -3, 3) # predictor variable y <- 0.5 * x + 1 / exp(-0.4 * x) * rnorm(1000, 1, 1) # response variable ScatterPlot(x, y) fit <- MultiFit(x, y, fits=c("lm", "quantreg", "poly2", "poly3", "spline", "gam", "rf", "logistic")) summary(fit) cols <- piratepal("basel") matplot(fit\$x, fit[,2:11], type="l", add=TRUE, lty=1, col=cols, lwd=2) legend("topleft", colnames(fit)[2:11], lty=1, col=cols, lwd=2) # same example but exclude very high values (> quantile 0.9) from fitting fit1 <- MultiFit(x, y, excl.quantile=c(0, 0.9)) lines(fit1\$x, fit1\$ensMean, type="l",lty=1, col="purple", lwd=3) # to compare fitted with original values compute # fits at original predictor variables (xout=x) fit <- MultiFit(x, y, fits=c("poly3", "gam"), xout=x) df <- data.frame(sim=c(fit\$poly3, fit\$gam), obs=rep(y, 2), groups=rep(c("poly3", "gam"), each=length(y))) of <- ObjFct(df\$sim, df\$obs, df\$groups) plot(of, which="RMSE") ScatterPlot(df\$sim, df\$obs, df\$groups, objfct=TRUE) TaylorPlot(df\$sim, df\$obs, df\$groups) # bivariate example with fit to a certain quantile ScatterPlot(x, y) fit <- MultiFit(x, y, fit.quantile=0.9, fits=c("spline", "gam", "poly3", "rf")) matplot(fit\$x, fit[,2:5], type="l", add=TRUE, lty=1, col=cols, lwd=2) legend("topleft", colnames(fit)[2:5], lty=1, col=cols, lwd=2) # example with two predictor variables a <- runif(1000, -3, 3) # 1st predictor variable b <- runif(1000, 0, 2) # 2nd predictor variable y <- 1.2 * b + 1 / exp(-0.4 * a) * rnorm(1000, 1, 0.2) # response variable plot(a, y) plot(b, y) fit <- MultiFit(x=data.frame(a, b), y, xout=NULL) image(x=unique(fit\$a), y=unique(fit\$b), z=matrix(fit\$lm, sqrt(nrow(fit))), main="ensMean") ## as 3D plot: #require(rgl) #with(data.frame(a, b), plot3d(a, b, y)) #with(fit, surface3d(unique(a), unique(b), ensMean, alpha=0.2, col="red")) # example with three predictor variables a <- runif(1000, -3, 3) # 1st predictor variable b <- runif(1000, 0, 2) # 2nd predictor variable c <- rnorm(1000, 1, 1) # 3rd predictor variable y <- 1.2 * b + 1 / exp(-0.4 * a) * c # response variable x <- data.frame(a, b, c) fit <- MultiFit(x, y, fits=c("poly2", "rf"), xout=x) ObjFct(fit\$rf, y) ObjFct(fit\$poly2, y) ```