ez.lms: lm(y~x+covar), for many y and/or many x

View source: R/stats.R

ez.lmsR Documentation

lm(y~x+covar), for many y and/or many x

Description

lm(y~x+covar), for many y and/or many x

Usage

ez.lms(
  df,
  y,
  x,
  covar = NULL,
  by = NULL,
  report = T,
  model = c("lm", "lmrob", "lmRob", "rlm"),
  view = F,
  plot = F,
  pmethods = c("bonferroni", "fdr"),
  cols = 3,
  point.size = 10,
  point.shape = 16,
  lab.size = 18,
  text.size = 16,
  error = T,
  pe = F,
  ...
)

ez.regressions(
  df,
  y,
  x,
  covar = NULL,
  by = NULL,
  report = T,
  model = c("lm", "lmrob", "lmRob", "rlm"),
  view = F,
  plot = F,
  pmethods = c("bonferroni", "fdr"),
  cols = 3,
  point.size = 10,
  point.shape = 16,
  lab.size = 18,
  text.size = 16,
  error = T,
  pe = F,
  ...
)

Arguments

df

a data frame. Internally go through dropna –> ez.2value –> scale
because of this, this function is not friendly to factors with 3 or more levels, although the default lm can handel factors nicely (yet complicatedly–eg, not easy to test significance)
by design, ez.2value to convert factors with 2 levels, such as sex, this generates the same results as lm by default, see note and example below for more
scale to get standarized beta as effect size for comparison across different variables

y

compatible with ez.selcol, or y could be cmd from ez.scatterplot where not necessary to specify x, covar, by. y~x+covar|1,4by. cmd format can only handel 1 y, 1 x

x

compatible with ez.selcol

covar

NULL=no covar, compatible with ez.selcol

by

NULL. if specified with a factor col, do the model on all subjects, and groups in that col separately. In this case returns a list of result data frame

report

print results (in APA format)

model

vector c('lm', 'lmrob', 'lmRob', 'rlm'), robustbase::lmrob–MM-type Estimators; robust::lmRob–automatically chooses an appropriate algorithm. one or more, 'lm' will always be included internally, even if not specified

view

call View(result)

plot

T/F, the black dash line is bonferroni p = 0.05 (again for tests only with a non-NA p values), the grey black dash is uncorrected p = 0.05

pmethods

c('bonferroni','fdr'), type p.adjust.methods for all methods. This correction is based on the total number of tests that successfully generate p values

cols

number of columns for multiplot. NULL=auto calculate

error

whether show error message when error occurs (also result will have an empty row when error occurs)

...

additional parameters to the specified model. if more than one model specified, ... may not be OK for all models, because diff models have diff parameters

Value

an invisible data frame or list of data frame (if many y and many x)
beta: standardized beta coefficients (simple or multiple regression) are the estimates resulting from a regression analysis that have been standardized
so that the variances of dependent and independent variables are 1.
Therefore, standardized coefficients refer to how many standard deviations a dependent variable will change,
per standard deviation increase in the predictor variable.
For simple regression (1 y ~ 1 x), the value of the standardized coefficient (beta) equals the correlation coefficient (r) (beta=r).
For multiple regression (with covar), the value of the standardized coefficient (beta) is close to semi-partial correlation
According to jerry testing, scale() or not for x,y or covar, does not change p values for predictors, although intercept would differ

dof: from F-statistic
residualization: say, y ~ x + a + b, first x ~ a + b is residualized and then y ~ x (ie, semi-partial). If no covar, y ~ x, although labelled residualized in result data frame, actually non-residualized, because nothing to residualize
as far as semi-partial concerned, y ~ x + a + b, x ~ y + a + b are two different models. ppcor::spcor.test(iris[,1],iris[,2],iris[,c(3,4)]) != ppcor::spcor.test(iris[,2],iris[,1],iris[,c(3,4)])
but for partial correlation, partial(y,x) is the same as partial(x,y), ppcor::pcor.test(iris[,1],iris[,2],iris[,c(3,4)]) = ppcor::pcor.test(iris[,2],iris[,1],iris[,c(3,4)])
On a related note, y ~ x + a + b, generates the same results as y ~ b + a + x (the order does not matter).
For coding purpose, I put x as x, and (a,b) as cov. stdbeta, p returned refer to x in MLR (multiple linear regression), values referring to a,b were discarded during the process.

++++++++++!!!important note!!!++++++++++
"n" | "n.lmrob" | "n.lmRob" | "n.rlm"
"stdbeta" | "stdbeta.lmrob" | "stdbeta.lmRob" | "stdbeta.rlm"
"p" | "p.lmrob" | "p.lmRob" | "p.rlm"
"r2" | "r2.lmrob" | "r2.lmRob" | "r2.rlm" (NA)
"r.spartial" | "r.spartial.lmrob" | "r.spartial.lmRob" | "r.spartial.rlm" (NA)
"p.spartial" | "p.spartial.lmrob" | "p.spartial.lmRob" | "p.spartial.rlm"
"r.partial" | "r.partial.lmrob" | "r.partial.lmRob" | "r.partial.rlm" (NA)
"p.partial" | "p.partial.lmrob" | "p.partial.lmRob" | "p.partial.rlm"

Summary: The absolute value of the semipartial correlation of X with Y is always less than or equal to that of the partial correlation of X with Y.
MLR p = ppcor::pcor.test ~=(>?) p.partial = (y.residual, x.residual)
ppcor::spcor.test ~=(>?) p.spartial = (y, x.residual)
? might be larger, because after residualized, no covar, so dof is larger ?

the stdbeta, p(.lm), p.lmrob etc in result data frame refer to stdbeta, p value for x in MLR, which are plotted when plot=T. the bestp is also selected based on this p value

According to my own demo (see examples iris), MLR p is the same as p values from ppcor::pcor.test and SPSS Analyze->Correlate->Partial! also very close to the p value from manual calculation with correlation(y.residual, x.residual). partial r the same in all cases.

the r.spartial, p.spartial refers to semi-partial correlation (r.spartial is the same as ppcor::spcor.test result, p.spartial very close to ppcor::spcor.test result.

no column named r, r(.lm), r.lmrob etc in the result data frame
r2.rlm or r.spartial.rlm are NA, but p values are available, because I do not know how to get them from rlm yet

Note

To keep consistent with other R functions (eg, lm which converts numeric/non-numeric factor to values starting from 0), set start.at=0 in ez.2value(), then factor(1:2)->c(0,1), factor(c('girl','boy'))->c(1,0) # the level order is boy,girl
in lm() the coding (0,1) vs.(1,2) does not affect slope, but changes intercept (but a coding from 1,2->1,3 would change slope–interval difference matters)

Examples

y=ez.zresidize(iris,'Sepal.Length','Petal.Width',scale=F)
x=ez.zresidize(iris,'Sepal.Width','Petal.Width',scale=F)
z=data.frame(Sepal.Length=y$Sepal.Length,Sepal.Width=x$Sepal.Width)
ez.lms(z,'Sepal.Length','Sepal.Width')
ppcor::pcor.test(iris$Sepal.Length,iris$Sepal.Width,iris$Petal.Width)
ppcor::spcor.test(iris$Sepal.Length,iris$Sepal.Width,iris$Petal.Width)

y = c(1,2,3,4,5,6)
x = c(2,4,15,20,25,36)
z=factor(c('m','m','m','f','f','f'))

lm(y~x+z) %>% summary
lm(y~x+ez.2value(z)) %>% summary
lm(y~x+ez.2value(z,start.at=2)) %>% summary
# same beta, p but diff Intercept


y = c(1,2,3,4,5,6)
x = c(2,4,15,20,25,36)
f=factor(c('m','m','u','u','f','f'))
lm(y~x+f) %>% summary
lm(y~x+ez.2value(f)) %>% summary
lm(y~x+ez.2value(f,start.at=2)) %>% summary
# the first is diff from the rest, changes p value


y = c(1,2,3,4,5,6)
x = c(2,4,15,20,25,36)
z=c(0,0,0,1,1,1)
zz=c(-1,-1,-1,1,1,1)
zzz=c(-3,-3,-3,1,1,1)
lm(y~x+z) %>% summary
lm(y~x+scale(z)) %>% summary
lm(y~x+zz) %>% summary
lm(y~x+zzz) %>% summary()
# again, scale and different coding strategy only change intercept and beta(beta?, that is the purpose!), but not p

jerryzhujian9/zmisc documentation built on March 9, 2024, 12:49 a.m.