# mReg: Multiple regression In berryFunctions: Function Collection Related to Plotting and Hydrology

## Description

Multiple regression fitting various function types including e.g. linear, cubic, logarithmic, exponential, power, reciprocal. Quick way to find out what function type fits the data best. Plots data and fitted functions and adds a legend with the functions (or their types=structure) sorted by R squared. Returns the fitted functions with their parameters and R^2 values in a data.frame.

## Usage

 ```1 2 3 4 5 6 7 8``` ```mReg(x, y = NULL, data = NULL, Poly45 = FALSE, exp_4 = FALSE, xf = deparse(substitute(x)), yf = deparse(substitute(y)), ncolumns = 9, plot = TRUE, add = FALSE, nbest = 12, R2min, selection = NULL, digits = 2, extend = 0.4, xlim = extendrange(x, f = extend), ylim = extendrange(y, f = extend), xlab = xf, ylab = yf, las = 1, lwd = rep(1, 12), lty = rep(1, 12), col = NULL, pcol = par("col"), pch = 16, legend = TRUE, legargs = NULL, legendform = "nameform", quiet = FALSE, ...) ```

## Arguments

 `x` Vector with x coordinates or formula (like y~x), the latter is passed to `model.frame` `y` Vector with y values. DEFAULT: NULL (to enable x to be a formula) `data` data.frame in which formula is applied. DEFAULT: NULL `Poly45` Logical. Should 4th and 5th degree polynomials also be fitted? DEFAULT: FALSE, as the formulas are very long. `exp_4` Logical. Return 4-parametric exponential distribution fits (via `exp4p`) in the output table? (only best fit is plotted). exp_4par_ini has the initial values of exponential fitting with the data relocated to first quadrant. The others are optimized with the methods of `optim`. DEFAULT: FALSE `xf` Character. x name for Formula. DEFAULT: substitute(x) before replacing zeros in x and y `yf` Ditto for y `ncolumns` Number of columns in output. Set lower to avoid overcrowding the console. DEFAULT: 9 `plot` Logical. plot data and fitted functions? DEFAULT: TRUE `add` Logical. add lines to existing plot? DEFAULT: FALSE `nbest` Integer. Number of best fitting functions to be plotted (console output table always has all). DEFAULT: 12 `R2min` Numerical. Minimum Rsquared value for function type to be plotted. Suggestion: 0.6 (2/3 of variation of y is explained by function of x). DEFAULT: empty `selection` Integers of functions to be plotted, assigned as in list in section "note". DEFAULT: NULL, meaning all `digits` Integer. number of significant digits used for rounding formula parameters and R^2 displayed. DEFAULT: 2 `extend` Numerical. Extention of axis ranges (proportion of range). DEFAULT: 0.4 `xlim` Numerical vector with two values, defining the x-range of the lines to be plotted. DEFAULT: extended range(x) `ylim` Ditto for Y-axis `xlab` Character. default labels for axis labeling and for formulas. DEFAULT: substitute(x) before replacing zeros in x and y `ylab` Ditto for y axis. `las` Integer in 0:4. label axis style. See `par`. DEFAULT: 1 `lwd` Numerical of length 12. line width for lines. DEFAULT: rep(1,12) `lty` Numerical of length 12. line type. DEFAULT: rep(1,12) `col` Numerical of length 12. line colors. DEFAULT: NULL, means they are specified internally `pcol` Color used for the data-points themselves. DEFAULT: par('col') `pch` Integer or single character. Point CHaracter for the data points. See `par`. DEFAULT: 16 `legend` Logical. Add legend to plot? DEFAULT: TRUE `legargs` List. List of arguments passed to `legend`. Will overwrite internal defaults. DEFAULT: NULL `legendform` One of 'full', 'form', 'nameform' or 'name'. Complexity (and length) of legend in plot. See Details. DEFAULT: 'nameform' `quiet` Suppress warnings about value removal (NAs, smaller 0, etc)? DEFAULT: FALSE `...` Further graphical parameters passed to plot

## Details

legendform : example
full : 7.8*x + 6.31
form : a*x+b
nameform : linear a*x+b
name : linear

full can be quite long, especially with Poly45=TRUE!

## Value

data.frame with rounded R squared, formulas, and full R^2 and parameters for further use. Rownames are the names (types) of function. Sorted decreasingly by R^2

## warning

A well fitting function does NOT imply correct causation!
A good fit does NOT mean that you describe the bahaviour of a system adequatly!
Extrapolation can be DANGEROUS!
Always extrapolate to see if a function fits the expected results there as well.
Avoid overfitting: Poly45 will often yield good results (in terms of R^2), but can be way overfitted. And outside the range of values, they act wildly.

## Note

If you're adjusting the appearance (lwd, lty, col) of single lines, set parameters in the following order:
# 1 linear a*x + b
# 2 quadratic (parabola) a*x^2 + b*x + c
# 3 kubic a*x^3 + b*x^2 + c*x + d
# 4 Polynom 4th degree a*x^4 + b*x^3 + c*x^2 + d*x + e
# 5 Polynom 5 a*x^5 + b*x^4 + c*x^3 + d*x^2 + e*x + f
# 6 logarithmic a*log(x) + b
# 7 exponential a*e^(b*x)
# 8 power/root a*x^b
# 9 reciprocal a/x + b
# 10 rational 1 / (a*x + b)
# 11 exponential 4 Param a*e^(b*(x+c)) + d

Negative values are not used for regressions containing logarithms; with warning.
exp_4par was originally developed for exponential temperature decline in a cup of hot water.

## Author(s)

Berry Boessenkool, [email protected], Dec 2012, updated April and Aug 2013, sept 2015

## References

Listed here: http://rclickhandbuch.wordpress.com/rpackages

`glm`, `lm`, `optim`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141``` ```set.seed(12) x <- c(runif(100,0,3), runif(200, 3, 25)) # random from uniform distribution y <- 12.367*log10(x)+7.603+rnorm(300) # random from normal distribution plot(x,y, xlim=c(0,40)) mReg(x,y) # warning comes from negative y-values (suppress with quiet=TRUE) # Formula specification: mReg(Volume~Height, data=trees) # NA management x[3:20] <- NA mReg(x,y) # Passing arguments to legend: mReg(x,y, pch=1, legargs=list(x="bottomright", cex=0.7), legendform="form") mReg(x,y, col=rainbow2(11)) mReg(x,y, extend=0.2) # less empty space around data points mReg(x,y, nbest=4) # only 4 distributions plotted mReg(x,y, legargs=list(x=7, y=8, bty="o", cex=0.6)) # Legend position as coordinates ## Not run: # Excluded from Rcmd check (opening external devices) View(mReg(x,y, Poly45=TRUE, exp_4=TRUE, plot=FALSE)) # exp_4: fit more distributions ## End(Not run) # optim methods often yield different results, so be careful using this. # I might insert a possibility to specify initial values for optim. # 4 Parameters allow several combinations to yield similarly good results! plot( 0:10, 3.5*exp(0.8*( 0:10 + 2 )) + 15 , type="l") lines(0:10, 18*exp(0.8*( 0:10 - 2.5e-05)) - 5, col=2) # okay, different dataset: x <- c(1.3, 1.6, 2.1, 2.9, 4.4, 5.7, 6.6, 8.3, 8.6, 9.5) y <- c(8.6, 7.9, 6.6, 5.6, 4.3, 3.7, 3.2, 2.5, 2.5, 2.2) mReg(x,y, legargs=list(cex=0.7, x="topright"), main="dangers of extrapolation") points(x,y, cex=2, lwd=2) # Polynomial fits are good within the data range, but, in this case obviously, # be really careful extrapolating! If you know that further data will also be low, # add another point to test differences: mReg(c(x,11,13,15), c(y,2,2,2), xf="myX", yf="myY", Poly45=TRUE, legendform="name") points(x,y, cex=2, lwd=2) # The Polynomials are still very good: they have 5 to 6 Parameters, after all! # Poly45 is set to FALSE by default to avoid such overfitting. mReg(x,y, pcol=8, ncol=0) # no return to console # only plot a subset: best n fits, minimum fit quality, or user selection mReg(x,y, pcol=8, ncol=2, nbest=4) mReg(x,y, pcol=8, ncol=2, R2min=0.7) mReg(x,y, pcol=8, ncol=2, selection=c(2,5,8)) # selecting the fifth degree polynomial activates Poly45 (in the output table) # Add to axisting plot: plot(x,y, xlim=c(0,40)) mReg(x,y, add=TRUE, lwd=12:1/2, ncol=0) # lwd, lty can be vectors of length 12, specifying each line separately. # Give those in fix order (see section notes), not in best-fit order of the legend. # The order is Polynomial(1:5), log, exp, power, reciprocal, rational, exp_4_param # color has to be a vector of 12 # opposedly, lwd and lty are repeated 12 times, if only one value is given # One more dataset: j <- c(5,8,10,9,13,6,2) ; k <- c(567,543,587,601,596,533,512) # Inset from margin of plot region: mReg(j,k, legargs=list(x="bottomright", inset=.05, bty="o"), legendform="name") # Legend forms mReg(j,k, legargs=list(x="bottomright"), legendform="name") mReg(j,k, legargs=list(x="bottomright"), legendform="form") mReg(j,k, legargs=list(x="bottomright"), legendform="nameform") mReg(j,k, legargs=list(x="bottomright"), legendform="full") ## Not run: # Excluded from Rcmd check (long computing time) # The question that got me started on this whole function... # exponential decline of temperature of a mug of hot chocolate tfile <- system.file("extdata/Temp.txt", package="berryFunctions") temp <- read.table(tfile, header=TRUE, dec=",") head(temp) plot(temp) temp <- temp[-20,] # missing value - rmse would complain about it x <- temp\$Minuten y <- temp\$Temp mReg(x,y, exp_4=TRUE, selection=11) # y=49*e^(-0.031*(x - 0 )) + 25 correct, judged from the model: # Temp=T0 - Te *exp(k*t) + Te with T0=73.76, Tend=26.21, k=-0.031 # optmethod="Nelder-Mead" # y=52*e^(-0.031*(x + 3.4)) + 26 wrong x <- seq(1, 1000, 1) y <- (x+22)/(x+123) # can't find an analytical solution so far. Want to check out nls mReg(x, y, legargs=list(x="right")) ## End(Not run) # Solitaire Results. According to en.wikipedia.org/wiki/Klondike_(solitaire): # Points=700000/Time + Score # I recorded my results as an excuse to play this game a lot. sfile <- system.file("extdata/solitaire.txt", package="berryFunctions") solitaire <- read.table(sfile, header=TRUE) mReg(solitaire\$Time, solitaire\$Points) # and yes, reciprocal ranks highest! Play Fast! mReg(solitaire\$Time, solitaire\$Bonus, xlim=c(50,200), extend=0, nbest=3) sol <- unique(na.omit(solitaire[c("Time","Bonus")])) sol sol\$official <- round(700000/sol\$Time/5)*5 mReg(sol\$Time, sol\$Bonus, extend=0, selection=9, col=rep(4,10), legendform="full") plot(sol\$Time, sol\$official-sol\$Bonus, type="l") # multivariate regression should be added, too: sfile <- system.file("extdata/gelman_equation_search.txt", package="berryFunctions") mv <- read.table(sfile, header=TRUE) sfile <- system.file("extdata/mRegProblem.txt", package="berryFunctions") x <- read.table(sfile, header=TRUE)\$x y <- read.table(sfile, header=TRUE)\$y mReg(x,y, digits=6) # all very equal x2 <- x-min(x) mReg(x2,y, digits=6) # Formulas are wrong if digits is too low!! #mReg(x2,y, legendform="full") # Zero and NA testing (to be moved to unit testing someday...) mReg(1:10, rep(0,10)) mReg(1:10, c(rep(0,9),NA)) mReg(1:10, rep(NA,10)) mReg(rep(1,10), 1:10) mReg(rep(0,10), 1:10) mReg(c(rep(0,9),NA), 1:10) mReg(rep(NA,10), 1:10) mReg(1:10, rep(0,10), quiet=TRUE) mReg(1:10, c(rep(0,9),NA), quiet=TRUE) mReg(1:10, rep(NA,10), quiet=TRUE) mReg(rep(1,10), 1:10, quiet=TRUE) mReg(rep(0,10), 1:10, quiet=TRUE) mReg(c(rep(0,9),NA), 1:10, quiet=TRUE) mReg(rep(NA,10), 1:10, quiet=TRUE) ```