Regression: Regression Analysis
In lessR: Less Code, More Results

Regression

R Documentation

Regression Analysis

Description

Abbreviation: reg, reg_brief

Provides a regression analysis with extensive output, including graphics, from a single, simple function call with many default settings, each of which can be re-specified. The computations are obtained from the R function lm and related R regression functions. The outputs of these functions are re-arranged and collated.

By default the data exists as a data frame with the default name of d, or specify explicitly with the data option. Specify the model in the function call as an R formula, that is, for a basic model, the response variable followed by a tilde, followed by the list of predictor variables, each pair separated by a plus sign, such as reg(Y ~ X1 + X2).

Output is generated into distinct segments by topic, organized and displayed in sequence by default. When the output is assigned to an object, such as r in r <- reg(Y ~ X), the full or partial output can be accessed for later analysis and/or viewing. A primary such analysis is with knitr for dynamic report generation, run from R directly or from within RStudio. The input instructions to knitr are written comments and interpretation with embedded R code, called R~Markdown. Doing a knitr analysis is to "knit" these comments and subsequent output together so that the R output is embedded in the resulting document – either html, pdf or Word – by default with explanation and interpretation. Generate a complete R~Markdown file with filetype (.Rmd) from the Rmd option. Simply specify the option with a file name in quotes, then run the Regression analysis to create the markdown file. Open the newly created .Rmd file in RStudio and click the knit button to create a formatted document that consists of the statistical results plus interpretative comments. See the sections arguments, value and examples for more information.

Usage

Regression(my_formula, data=d, filter=NULL,
         digits_d=NULL,

         Rmd=NULL, Rmd_browser=TRUE, 
         Rmd_format=c("html", "word", "pdf", "odt", "none"),
         Rmd_data=NULL, Rmd_custom=NULL, Rmd_dir=path.expand("~/reg"),
         Rmd_labels=FALSE,
         results=getOption("results"), explain=getOption("explain"),
         interpret=getOption("interpret"), code=getOption("code"), 

         text_width=120, brief=getOption("brief"), show_R=FALSE,
         plot_errors,

         n_res_rows=NULL, res_sort=c("cooks","rstudent","dffits","off"),
         n_pred_rows=NULL, pred_sort=c("predint", "off"),
         subsets=NULL, best_sub=c("adjr2", "Cp"), cooks_cut=1,

         scatter_coef=TRUE, mod=NULL, mod_transf=c("center", "z", "none"), 

         X1_new=NULL, X2_new=NULL, X3_new=NULL, X4_new=NULL,
         X5_new=NULL, X6_new=NULL,

         kfold=0, seed=NULL,
         new_scale=c("none", "z", "center", "0to1", "robust"),
         scale_response=FALSE,

         quiet=getOption("quiet"), bubble_plot=NULL,
         graphics=TRUE, size=NULL, pdf=FALSE, width=6.5, height=6.5,
         refs=FALSE,

         n_cat=getOption("n_cat"),

         fun_call=NULL, ...)

reg(...)
reg_brief(..., brief=TRUE)

Arguments

`my_formula`	Standard R `formula` for specifying a model. For example, for a response variable named Y and two predictor variables, X1 and X2, specify the corresponding linear model as Y ~ X1 + X2.
`data`	The default name of the data frame that contains the data for analysis is `d`, otherwise explicitly specify. If knitting and rendering the generated R~Markdown for an interpretative output as specified by the `Rmd` parameter, then this data frame must first be read by the `lessR` function `Read`.
`filter`	A logical expression that specifies a subset of rows of the data frame to analyze.
`digits_d`	For the Basic Analysis, it provides the number of decimal digits, set by default to at least 2 or the largest number of digits in the values of the response variable plus 1.

`Rmd`	File name for the automatically generated R Markdown file, if specified. The file type is .Rmd, a simple text file that can be edited with any text editor, including RStudio to generate custom output.
`Rmd_browser`	If `html` format for Rmd rendering, then automatically open output in a browser.
`Rmd_format`	Format of one or more rendered R Markdown file formats, expressed in any combination of uppercase and lowercase letters. Default is `"html"`, with a browser view automatic, or `"word"`, `"odt"`, `"pdf"` (if LaTeX is available), or `"none"`. Requires `pandoc` installed, such as from `RStudio`.
`Rmd_data`	The default file reference of the data file when running the generated R Markdown file is the last data file as read by `Read` (with the unabbreviated version of the function name). To refer to a different file to read specify the path name or web URL of the file.
`Rmd_custom`	Vector of input text sections in the Rmd file for which to convert.
`Rmd_dir`	Directory where custom input text files are located for the `Rmd` option.
`Rmd_labels`	Label each section of the markdown output according to the name of its input file.
`results`	By default TRUE. If set to `FALSE` the results are not provided in the R Markdown document, relying upon the interpretations. Can set globally with `style(results=FALSE)`.
`explain`	By default TRUE. If set to `FALSE` the explanations are not provided in the R Markdown document. Can set globally with `style(explain=FALSE)`.
`interpret`	By default TRUE. If set to `FALSE` the interpretations of the results are not provided in the R Markdown document. Can set globally with `style(interpret=FALSE)`.
`code`	By default TRUE. If set to `FALSE` the R code that generates the results is not provided in the R Markdown file. Can set globally with `style(code=FALSE)`.

`text_width`	Width of the text output at the console.
`brief`	If set to `TRUE`, reduced text output. Can change system default with `style` function.
`show_R`	Display the R instructions that yielded the `lessR` output, albeit without the additional formatting of the results such as combining output of different functions into a table.
`plot_errors`	For a one-predictor model, plot the line segment that joins each point to the regression line, illustrating the size of the residuals.

`n_res_rows`	Default is 20, which lists the first 20 rows of data sorted by the specified sort criterion. To disable residuals, specify a value of 0. To view the output for all observations, specify a value of `"all"`.
`res_sort`	Default is `"cooks"`, for specifying Cook's distance as the sort criterion for the display of the rows of data and associated residuals. Other values are `"rstudent"` for externally Studentized residuals, `"dffits"` for dffits and `"off"` to not sort the rows of data.
`n_pred_rows`	Default is 3, which lists prediction intervals only for the first, middle and last 3 rows of data, unless there are 25 or less rows of data when all rows are displayed. To disable prediction intervals, specify a value of 0. To see the output for all rows of data, specify a value of `"all"`.
`pred_sort`	Default is `"predint"`, which sorts the rows of data and associated intervals by the lower bound of each prediction interval. Turn off this sort by specifying a value of `"off"`.
`subsets`	Default is to produce the analysis of the fit based on adjusted R-squared for all possible model subsets of size 10 for each number of predictors, from the `leaps` package. Set to `FALSE` to turn off. Defaults lists a maximum of the first 50 values. Specify an integer to change the maximum.
`best_sub`	Criterion for selecting best subsets of predictor variables, with default of `"adjr2"` or choose Mallow's `"Cp"` statistic.
`cooks_cut`	Cutoff value of Cook's Distance at which observations with a larger value are flagged in red and labeled in the resulting scatterplot of Residuals and Fitted Values. Default value is 1.0.

`scatter_coef`	Display the correlation coefficients in the upper triangle of the scatterplot matrix.
`mod`	Declare one continuous (numeric) predictor variable a moderator variable in a two predictor model.
`mod_transf`	Applies when `mod` specified, rescales the predictor variables, with default `"center"`, and options of `"z"` for standardize and `"none"`.

`X1_new`	Values of the first listed numeric predictor variable for which forecasted values and corresponding prediction intervals are calculated.
`X2_new`	Values of the second listed numeric predictor variable for which forecasted values and corresponding prediction intervals are calculated.
`X3_new`	Values of the third listed numeric predictor variable for which forecasted values and corresponding prediction intervals are calculated.
`X4_new`	Values of the fourth listed numeric predictor variable for which forecasted values and corresponding prediction intervals are calculated.
`X5_new`	Values of the fifth listed numeric predictor variable for which forecasted values and corresponding prediction intervals are calculated.
`X6_new`	Values of the sixth listed numeric predictor variable for which forecasted values and corresponding prediction intervals are calculated.

`kfold`	Number of K-fold cross-validations. If conducted, only the cross-validation output shown.
`seed`	Parameter `kfold` generates random partitions, folds, of data. Set the seed to an integer to recover the same random partitions on subsequent runs.
`new_scale`	Transform numeric predictor variables with more than two unique values to the specified metric before conducting the analysis, `"z"`, `"center"`, `"0to1"`, or `"robust"`. Applies to `kfold` separately to the training and testing folds as well to avoid data leakage.
`scale_response`	If doing a rescale with `new_scale`, by default do not scale the response variable, or set to `TRUE` to rescale along with the predictor variables.

`quiet`	If set to `TRUE`, no text output. Can change system default with `style` function.
`graphics`	Produce graphics. Default is `TRUE`. In `knitr` can be useful to set to `FALSE` so that `regPlot` can be used to place the graphics within the output file.
`size`	Size of plotted points.
`pdf`	If `TRUE`, then graphics are written to pdf files.
`width`	Width of the pdf file in inches.
`height`	Height of the pdf file in inches.
`bubble_plot`	For a single predictor variable, by default, plot as bubbles only when the single predictive variable and the target variable have less than or equal to 12 unique values. Otherwise, set `TRUE` or `FALSE`.
`refs`	If `TRUE`, then list the references for R and the packages used from which functions were used to generate the output.
`n_cat`	Number of categories, specifies the largest number of unique, equally spaced integer values of a variable for which the variable will be analyzed as categorical instead of continuous. Default is 0. Use to specify that such variables are to be analyzed as categorical, a kind of informal R factor. [deprecated]: Best to convert a categorical integer variable to a factor.
`fun_call`	Function call. Used internally with `knitr` to pass the function call when obtained from the abbreviated function call `reg`. Not usually invoked by the user.
`...`	Other parameter values for R function `lm` which provides the core computations.

Details

OVERVIEW
The purpose of Regression is to combine the following function calls into one, as well as provide ancillary analyses such as as graphics, organizing output into tables and sorting to assist interpretation of the output, as well as generate R Markdown to run through knitr, such as with RStudio, to provide extensive interpretative output.

The basic analysis successively invokes several standard R functions beginning with the standard R function for estimation of a linear model, lm. The output of the analysis of lm is stored in the object lm.out, available for further analysis in the R environment upon completion of the Regression function. By default reg automatically provides the analyses from the standard R functions, summary, confint and anova, with some of the standard output modified and enhanced. The correlation matrix of the model variables is obtained with cor function. The residual analysis invokes fitted, resid, rstudent, and cooks.distance functions. The option for prediction intervals calls the standard R function predict, once with the argument interval="confidence" and once with interval="prediction". The lessR Density function provides the histogram and density plots for the residuals and the ScatterPlot function provides the scatter plots of the residuals with the fitted values and of the data for the one-predictor model. The pairs function provides the scatterplot matrix of all the variables in the model. Thomas Lumley's leaps package contains the leaps function that provides the analysis of the fit of all possible model subsets.

INPUT DATA FRAME
The name d is by default provided by the Read function included in this package for reading and displaying information about the data in preparation for analysis. If all the variables in the model are not in the same data frame, the analysis will not complete. Specify the name of the data frame for analysis with the data option if the name is not the default d.

The filter parameter subsets rows (cases) of the input data frame according to a logical expression. Use the standard R operators for logical statements as described in Logic such as & for and, | for or and ! for not, and use the standard R relational operators as described in Comparison such as == for logical equality != for not equals, and > for greater than. See the Examples.

TEXT OUTPUT
The output is produced in pieces by topic (see values below), automatically collated by default in the final output. But the pieces are available for later reference if the output of the function is directed toward an object, such as r in r <- reg(Y ~ X). This is especially useful if the pieces are accessed within knitr or individual pieces are displayed at the console.

The text output is organized to provide the most relevant information while at the same time minimizing the total amount of output, particularly for analyses with large numbers of observations (rows of data), the display of which is by default restricted to only the most interesting or representative observations in the analyses of the residuals and predicted values. Additional economy can be obtained by invoking the brief=TRUE option, or run reg_brief, which limits the analysis to just the basic analysis of the estimated coefficients and fit, and if X1_new, etc. are requested, the relevant rows of forecasted values:w .

R MARKDOWN
An R~Markdown file ready for knitting and rendering into one of several formats can be obtained by specifying a value for Rmd. For the specified file name, the directory to which the file is written is displayed on the console text output, and the file type .Rmd is automatically appended to the specified name if it is not included in the specification.

To access the same data file for the regression analysis from running Regression from the R console, and that accomplished by knitting the generated R~Markdown, first read the data into R with the lessR Read function. That function stores the name of the last data file read so that it can be accessed via R as the markdown is knit and then rendered into the specified format. The default rendering is to HTML, but other formats can be specified with Rmd_format.

The output from Rmd is conceptually partitioned into five parts: results, explanations of the results, interpretations of the results, documentation o the code, and the code itself. By default all available output is generated but the flags results, explain, interpret, document, code can be set to FALSE to reduce the output. The options can be specified in a specific function all or set globally, such as with options(explain=FALSE). Turning off all five flags leaves just the outline of the potential output and a bare minimum of results.

Both any existing variable labels and variable units are included in the output to the R~Markdown file. Any variable units set as a dollar, are set as USD dollars and cents in the output, displayed with a dollar sign.

The default analysis provides as text output to the console the model's parameter estimates and corresponding hypothesis tests and confidence intervals, goodness of fit indices, the ANOVA table, correlation matrix of the model's variables, analysis of residuals and influence as well as the confidence and prediction intervals for each observation in the model. Also provided, for multiple regression models, collinearity analysis of the predictor variables and adjusted R-squared for the corresponding models defined by each possible subset of the predictor variables.

The Markdown is produced from input files, one for each section of the rendered document. Find the default files and their names at:\ system.file("Rmd/reg/", package="lessR")\ The Rmd_dir option specifies a location for custom input files. The Rmd_custom parameter specifies which default files should be replaced by custom files, anywhere from any one of them to all eight.

DECIMAL DIGITS
The number of decimal digits displayed on the output is, by default, the maximum number of decimal digits for all the data values of the response variable. Or, this value can be explicitly specified with the digits_d parameter.

Visualizations
Three default graphs are provided. When running R by itself, by default the graphs are written to separate graphics windows (which may overlap each other completely, in which case move the top graphics windows). Or, the pdf option may be invoked to save the graphs to a single pdf file called regOut.pdf. Within RStudio the graphs are successively written to the Plots window. Within knitr from RStudio the graphics will all appear by default at the beginning of the output. Or set to graphics=FALSE, and generate them individually with the accompanying function regPlot at the desired location within the file.

1. A histogram of the residuals includes the superimposed normal and general density plots from the Density function included in this lessR package. The overlapping density plots, which both overlap the histogram, are filled with semi-transparent colors to enhance readability.

2. A scatterplot of the residuals with the fitted values is also provided from the ScatterPlot function included in this package. The point corresponding to the largest value of Cook's distance, regardless of its size, is plotted in red and labeled and the corresponding value of Cook's distance specified in the subtitle of the plot. Also by default all points with a Cook's distance value larger than 1.0 are plotted in red, a value that can be specified to any arbitrary value with the cooks_cut option. This scatterplot also includes the lowess curve.

3. For models with a single predictor variable, a scatterplot of the data is produced, which also includes the regression line and corresponding confidence and prediction intervals. As with the density histogram plot of the residuals and the scatterplot of the fitted values and residuals, the scatterplot includes a colored background with grid lines. For multiple regression models, a scatterplot matrix of the variables in the model with the lowess best-fit line of each constituent scatterplot is produced. If the scatter_coef option is invoked, each scatterplot in the upper-diagonal of the correlation matrix is replaced with its correlation coefficient.

RESIDUAL ANALYSIS
By default the residual analysis lists the data and fitted value for each observation as well as the residual, Studentized residual, Cook's distance and dffits, with the first 20 observations listed and sorted by Cook's distance. The res_sort option provides for sorting by the Studentized residuals or not sorting at all. The n_res_rows option provides for listing these rows of data and computed statistics statistics for any specified number of observations (rows). To turn off the analysis of residuals, specify n_res_rows=0.

PREDICTION INTERVALS
The output for the confidence and prediction intervals includes a table with the data and fitted value for each observation, the lower and upper bounds for the confidence interval and the prediction interval, and the wide of the prediction interval. The observations are sorted by the lower bound of each prediction interval. If there are 25 or more observations then the information for only the first three, the middle three and the last three observations is displayed. To turn off the analysis of prediction intervals, specify n_pred_rows=0, which also removes the corresponding intervals from the scatterplot produced with a model with exactly one predictor variable, yielding just the scatterplot and the regression line.

The data for the default analysis of the prediction intervals is for the values of the predictor variables for each observation, that is, for each row of the data. New values of the predictor variables can be specified for the calculation of the prediction intervals by providing values for the options X1_new for the values of the first listed predictor variable in the model, X2_new for the second listed predictor variable, and so forth for up to five predictor variables, and all predictor variables are numeric. To provide these values, use functions such as seq for specifying a sequence of values and c for specifying a vector of values. For multiple regression models, all combinations of the specified new values for all of the predictor variables are analyzed.

RELATIONS AMONG THE VARIABLES
By default the correlation matrix of all the variables in the model is displayed, and, for multiple regression models, collinearity analysis is provided. Also provided are the first 50 models with the largest R squared adjusted from each possible model from an analysis of all possible subsets of the predictor variables. This all subsets analysis requires the leaps function from the leaps package. These contributed packages are automatically loaded if available. To turn off the all possible sets option, set subsets=FALSE.

RECODE PREDICTOR VARIABLES
The new_scale parameter provides for recoding the values of the predictor variables according to several different transformations: "z", "center", "0to1", or "robust". The later is a robust version of classic standardization in which the mean is replaced by the median and the standard deviation by the IQR. All numeric predictor variables with more than two values are standardized.

So any numeric variable with more than two values that is a categorical variable should be first converted to an R factor. If there are some numeric predictor variables that should not be standardized, such as an interaction term with centered variables that define the interaction, then the rescaling should be done separately, such as with base~R function scale or lessR rescale.

ANCOVA
If there are two predictor variables, one categorical and one continuous, an analysis of covariance is performed. The resulting scatterplot is of the continuous response variable and predictor variable, at each level of the categorical variable. To address the unbalanced ANOVA design, the Type~II sums of squares are reported for each effect. The regression model for each level of the categorical variable are displayed.

A categorical variable is defined as either an R factor or a non-numeric variable. If numeric and categorical, then explicitly define the categorical variable as a factor.

MODERATOR VARIABLE
For two predictor models, one of the predictor variables can be entered into the analysis as a moderator variable with the mod parameter. By default the two predictor variables are centered, so their means become zero. Then a third variable is entered into the model, the interaction of the two centered variables, computed by multiplication of their respective values, row by row. The potential interaction is visually displayed by plotting response Y against predictor X, at three different values of continuous W: the mean and 1 standard deviation above and below the mean.

For predictor variable, X, second predictor as a potential moderator, W, and response Y, enter the following R input.
reg(Y ~ X + W, mod=W)
From this, with now centered variables X and Y, the following multiple regression model is automatically defined.
Y^ = b0 + bx(X) + bw(W) + bxw(XW)
From that model, the functions sets the moderator variable W to each of the three constant values, Wc, and solves for the given value Wc to visually plot the potential interaction.

INVOKED R OPTIONS
The options function is called to turn off the stars for different significance levels (show.signif.stars=FALSE), to turn off scientific notation for the output (scipen=30), and to set the width of the text output at the console to 120 characters. The later option can be re-specified with the text_width option. After Regression is finished with a normal termination, the options are re-set to their values before the Regression function began executing.

COLOR THEME
A color theme for all the colors can be chosen for a specific plot with the colors option. Or, the color theme can be changed for all subsequent graphical analysis with the lessR function style. The default color theme is lightbronze, but a gray scale is available by removing the bronze background, such as with style(window_fill="white") or with "gray". Other themes are available as explained in style.

VARIABLE LABELS
If variable labels exist, then the corresponding variable label is by default listed as the label for the horizontal axis and on the text output. For more information, see Read.

Value

The output can optionally be returned and saved into an R object, otherwise it simply appears at the console. The components of this object are redesigned in lessR version 3.3 into (a) pieces of text that form the readable output and (b) a variety of statistics. The readable output are character strings such as tables amenable for viewing and interpretation. The statistics are numerical values amenable for further analysis, such as to be referenced in a subsequent knitr document. The motivation of these two types of output is to facilitate knitr documents, as the name of each piece, preceded by the name of the saved object followed by a dollar sign, can be inserted into the knitr document (see examples).

TEXT OUTPUT
out_background: variables in the model, rows of data and retained
out_estimates: estimated coefficients, hypothesis tests and confidence intervals
out_fit: fit indices; st dev of residuals; R-sq with adj and PRESS versions
out_anova: analysis of variance
out_cor: correlations among all variables in the model
out_collinear: collinearity analysis
out_subsets: R squared adjusted for all (or many) possible subsets
out_residuals: residuals
out_predict: analysis of residuals and influence
out_ref: references if selected on the Regression function call
out_Rmd: lists the name and location of the generated Rmd file
out_plots: list of plots generated if more than one
out_suggest: list of suggested other analyses

Separated from the rest of the text output are the major headings, which can be not included with custom collations of the output. out_title_bck: BACKGROUND
out_title_basic: BASIC ANALYSIS
out_title_rel: RELATIONS AMONG THE VARIABLES
out_title_res: ANALYSIS OF RESIDUALS AND INFLUENCE
out_title_pred: FORECASTING ERROR

STATISTICS
call: function call that generated the analysis
formula: model formula that specifies the model
vars: vector of variable names in the model
n.vars: number of variables in the model
n.obs: number of rows of data submitted for analysis
n.keep: number of rows of data retained in the analysis
coefficients: estimated regression coefficients
sterrs: standard errors of the estimated coefficients
tvalues: t-values of the estimated coefficients for null of 0
pvalues: p-values from the t-tests of the estimated coefficients
cilb: lower bound of 95% confidence interval of estimate
ciub: upper bound of 95% confidence interval of estimate
anova_model: model df, ss, ms, F-value and p-value
anova_residual: residual df, ss and ms
anova_total: total df, ss and ms
se: standard deviation of the residuals
resid_range: 95% range of normally distributed fitted residuals
Rsq: R-squared
Rsqadj: adjusted R-squared
PRESS: PRESS sum of squares
RsqPRESS: PRESS R-squared
m_se: K-fold average of the standard deviation of residuals. m_MSE: K-fold average of the MSE. m_Rsq: K-fold average of R-squared. cor: correlation matrix of all variables in the model
tolerances: tolerance of each predictor variable for collinearity analysis
VIF: variance inflation factor for each predictor variable
resid.max: five largest values of the residuals on which the output is sorted
pred_min_max: Rows with the smallest and largest prediction intervals
residuals: residuals
fitted: fitted values
cooks.distance: Cook's distance
model: data retained for the analysis
terms: terms specified for the analysis

Although not typically needed for analysis, if the regression output is assigned to an object named, for example, r, then the complete contents of the object can be viewed directly with the unclass function, here as unclass(r). Invoking the class function on the saved object reveals a class of out_all. The class of each of the text pieces of output is out.

Author(s)

David W. Gerbing (Portland State University; gerbing@pdx.edu)

References

Lumley, T., leaps function from the leaps package.

Gerbing, D. W. (2023). R Data Analysis without Programming: Explanation and Interpretation, 2nd edition, Chapters 11-13, NY: Routledge.

Gerbing, D. W. (2021). Enhancement of the Command-Line Environment for use in the Introductory Statistics Course and Beyond, Journal of Statistics and Data Science Education, 29(3), 251-266, https://www.tandfonline.com/doi/abs/10.1080/26939169.2021.1999871.

Xie, Y. (2013). Dynamic Documents with R and knitr, Chapman & Hall/CRC The R Series.

Examples

# read internal data set
d <- rd("Reading", quiet=TRUE)
# do not need all this data, so take only 30% to reduce CPU time
d <- Subset(random=.3)

# one-predictor regression
# Provide all default analyses including scatterplot etc.
# Can abbreviate Regression with reg
Regression(Reading ~ Verbal)
# Provide only the brief analysis on standardized variables
#  with 3-fold cross-validations
reg_brief(Reading ~ Verbal, new_scale="z", kfold=3)

# Access the pieces of output, here in an object named \code{r}
r <- reg(Reading ~ Verbal + Absent + Income)
# Display all output at the console in the standard sequence
r
# list the names of all the saved components
names(r)
# Display just the estimated coefficients and their inferential analysis
r$out_estimates



# Generate an R markdown file with the option: Rmd
# Output file here will be read.Rmd, a simple text file that can
#   be edited with any text editor including RStudio from which it
#   can be knit to generate dynamic output to a Word document,
#   pdf file or html file, as well as automatically rendered
# Here knit into an html file, but do not display
#reg(Reading ~ Verbal + Absent, Rmd="read", Rmd_browser=FALSE)

# generate interpretative R markdown file and render Word and odt
#reg(Reading ~ Verbal + Absent, Rmd="eg", Rmd_format=c("word", "odt"))

# just for incomes > 100000 and less than 5 days absent 
Regression(Reading ~ Verbal, filter=(Income > 100 & Absent < 5))

# standardize
Regression(Reading ~ Verbal, new_scale="z")


# Multiple regression model
# Save the three output plots as pdf files 4 inches square
#Regression(Reading ~ Verbal + Absent + Income, pdf=TRUE,
#   width=4, height=4)

# Compare nested models
# Reduced model:  Reading ~ Verbal
# Full model:  Reading ~ Verbal + Income + Absent
Nest(Reading, Verbal, c(Income, Absent))


# Specify new values of the predictor variables to calculate
#  forecasted values and the corresponding prediction intervals
# Specify an input data frame other than d, see help(mtcars)
Regression(mpg ~ hp + wt, data=mtcars,
  X1_new=seq(50,350,50), X2_new=c(2,3))

# Indicator (dummy) variable
#d <- Read("Employee", quiet=TRUE)
#reg(Salary ~ Dept)

lessR documentation built on June 8, 2025, 10:35 a.m.