Version Note: Up-to-date with v0.3.0
knitr::opts_chunk$set(message=FALSE,warning = FALSE, comment = NA) old.hooks <- fansi::set_knit_hooks(knitr::knit_hooks)
library(psycModel)
TLDR:
1) It is a beginner-friendly R package for statistical analysis in social science.
2) Tired of manually writing all variables in a model? You can use dplyr::select() syntax for all models
3) Fitting models, plotting, checking goodness of fit, and model assumption violations all in one place.
4) Beautiful and easy-to-read output. Check out this example now.
Support models:
1. Linear regression (i.e., support ANOVA, ANCOVA), generalized linear regression.
2. Linear mixed effect model (or HLM to be more specific), generalized linear mixed effect model.
3. Confirmatory and exploratory factor analysis.
4. Simple mediation analysis.
5. Reliability analysis.
6. Correlation, descriptive statistics (e.g., mean, SD).
At its core, this package allows people to analyze their data with one simple function call. For example, when you are running a linear regression, you need to fit the model, check the goodness of fit (e.g., R2), check the model assumption, and plot the interaction (if the interaction is included). Without this package, you need several packages to do the above steps. Additionally, if you are an R beginner, you probably don't know where to find all these R packages. This package has done all that work for you, so you can just do everything with one simple function call.
Another good example is CFA. The most common (and probably the only) option to fit a CFA in R is using lavaan. Lavaan has its own unique set of syntax. It is very versatile and powerful, but you do need to spend some time learning it. It may not worth the time for people who just want to run a quick and simple CFA model. In my package, it's very intuitive with cfa_summary(data, x1:x3)
, and you get the model summary, the fit measures, and a nice-looking path diagram. The same logic also applies to HLM since lme4
/ nlme
also has its own set of syntax that you need to learn.
Moreover, I also made fitting the model even simpler by using the dplyr::select
syntax. In short, traditionally, if you want to fit a linear regression model, the syntax looks like this lm(y ~ x1 + x2 + x3 + x4 + ... + xn, data)
. Now, the syntax is much shorter and more intuitive: lm_model(y, x1:xn, data)
. You can even replace x1:xn
with everything()
. I also wrote this very short article that teaches people how to use the dplyr::select()
syntax (it is not comprehensive, and it is not intended to be).
Finally, I made the output in R much more beautiful and easy to read. The default output from R, to be frank, look ugly. I spent a lot of time making sure it looks good in this package (see below for examples). I am sure that you will see how big the improvement is.
integrated_model_summary
is the integrated function for linear regression and generalized linear regression. It will first fit the model using lm_model
or glm_model
, then it will pass the fitted model object to model_summary
which produces model estimates and assumption checks. If interaction terms are included, they will be passed to the relevant interaction_plot function for plotting (the package currently does not support generalized linear regression interaction plotting).
Additionally, you can request assumption_plot
and simple_slope
(default is FALSE
). By requesting assumption_plot
, it produces a panel of graphs that allow you to visually inspect the model assumption (in addition to testing it statistically). simple_slope
is another powerful way to probe further into the interaction. It shows you the slope estimate at the mean and +1/-1 SD of the mean of the moderator. For example, you hypothesized that social-economic status (SES) moderates the effect of teacher experience on education quality. Then, simple_slope shows you the slope estimate of teacher experience on education quality at +1/-1 SD and the mean level of SES. Additionally, it produces a Johnson-Newman plot that shows you at what level of the moderator that the slope_estimate is predicted to be insignificant.
lm_model_summary( data = iris, response_variable = Sepal.Length, predictor_variable = tidyselect::everything(), two_way_interaction_factor = c(Sepal.Width, Petal.Width), model_summary = TRUE, interaction_plot = TRUE, assumption_plot = TRUE, simple_slope = TRUE, plot_color = TRUE )
This is the multilevel-variation of integrated_model_summary
. It works exactly the same way as integrated_model_summary
except you need to specify the non_random_effect_factors (i.e., level-2 factors) and the random_effect_factors (i.e., the level-1 factors) instead of predictor_variable
.
lme_multilevel_model_summary( data = popular, response_variable = popular, random_effect_factors = extrav, non_random_effect_factors = c(sex, texp), three_way_interaction_factor = c(extrav, sex, texp), graph_label_name = c("popular", "extraversion", "sex", "teacher experience"), # change interaction plot label id = class, model_summary = TRUE, interaction_plot = TRUE, assumption_plot = FALSE, # you can try set to TRUE simple_slope = FALSE, # you can try set to TRUE plot_color = TRUE )
This can be used to compared model. All type of model comparison supported by performance::compare_performance()
are supported since this is just a wrapper for that function.
fit1 <- lm_model( data = popular, response_variable = popular, predictor_var = c(sex, extrav), quite = TRUE ) fit2 <- lm_model( data = popular, response_variable = popular, predictor_var = c(sex, extrav), two_way_interaction_factor = c(sex, extrav), quite = TRUE ) compare_fit(fit1, fit2)
CFA model is fitted using lavaan::cfa()
. You can pass multiple factor (in the below example, x1, x2, x3 represent one factor, x4,x5,x6 represent another factor etc.). It will show you the fit measure, factor loading, and goodness of fit based on cut-off criteria (you should review literature for the cut-off criteria as the recommendations are subjected to changes). Additionally, it will show you a nice-looking path diagram.
cfa_summary( data = lavaan::HolzingerSwineford1939, x1:x3, x4:x6, x7:x9 )
EFA model is fitted using psych::fa()
. It first find the optimal number of factor. Then, it will show you the factor loading, uniqueness, complexity of the latent factor (loading < 0.4 are hided for better viewing experience). You can additionally request running a post-hoc CFA model based on the EFA model.
efa_summary(lavaan::HolzingerSwineford1939, starts_with("x"), # x1, x2, x3 ... x9 post_hoc_cfa = TRUE) # run a post-hoc CFA
Measurement invariance is fitted using lavaan::cfa()
. It uses the multi-group confirmatory factor analysis approach. You can request metric or scalar invariance by specifying the invariance_level
(mainly to save time. If you have a large model, it doesn't make sense to fit a unnecessary scalar invariance model if you are only interested in metric invariance)
measurement_invariance( x1:x3, x4:x6, x7:x9, data = lavaan::HolzingerSwineford1939, group = "school", invariance_level = "scalar" # you can change this to metric )
Currently, the package only support simple mediation with covariate. You can try to fit a multi-group mediation by specifying the group argument. But, honestly, I don't know that's the correct approach to implement it. If you want more complicated mediation, I highly recommend using the mediation
package. Eventually, I probably will switch to using that for this package.
mediation_summary( data = lmerTest::carrots, response_variable = Preference, mediator = Sweetness, predictor_variable = Crisp, control_variable = Age:Income )
It will first determine whether your item is uni- or multidimensionality. If it is unidimensional, then it will compute the alpha and the single-factor CFA model. If it is multidimensional, then it will compute the alpha and the omega. It also provide descriptive statistics. Here is an example for unidimensional items:
reliability_summary(data = lavaan::HolzingerSwineford1939, cols = x1:x3)
Here is an example for multidimensional items:
reliability_summary(data = lavaan::HolzingerSwineford1939, cols = x1:x9)
There isn't much to say about correlation except that you can request different type of correlation based on the data structure. In the backend, I use the correlation
package for this.
cor_test(iris, where(is.numeric))
It put together a nice table of some descriptive statistics and the correlation. Nothing fancy.
descriptive_table(iris, cols = where(is.numeric)) # all numeric columns
if you want to produce these beautiful output in R Markdown. Calls this function and see the most up-to-date advice.
knit_to_Rmd()
This conclude my briefed discussion of this package. There are some more additionally functions (like cfa_groupwise
) that probably have fewer use cases. You can check out what they do by enter ?cfa_groupwise
. Anyway, that's it. I hope you enjoy the package, and please let me know if you have any feedback. If you like it, please considering giving a star on GitHub. Thank you.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.