title: 'rempsyc: Convenience functions for psychology' tags: - R - psychology - statistics - visualization authors: - name: Rémi Thériault orcid: 0000-0003-4315-6788 affiliation: 1 affiliations: - name: "Department of Psychology, Université du Québec à Montréal, Québec, Canada" index: 1 date: "2023-03-13" bibliography: paper.bib output: #rticles::joss_article md_document: preserve_yaml: TRUE variant: "markdown_strict" journal: JOSS link-citations: yes csl: apa.csl
{rempsyc} is an R package of convenience functions that make the analysis-to-publication workflow faster, easier, and less error-prone. It enables the creation of publication-ready APA (American Psychological Association) tables exportable to Word (via {flextable}) and easily customizable APA-compliant plots (via {ggplot2}). It makes it easy to run statistical tests, check assumptions, and automate various tasks common in psychology research and social sciences more broadly.
There are many reasons to use R [@base2021] for analyzing and reporting data from research studies, such as being compatible with the ideals of open science [@quintana2020]. However, R has a major downside for novices: its steep learning curve due to its programmatic interface, in contrast to perhaps more user-friendly point-and-click software. Of course, this flexibility is also a strength, as the R community can and does come together to produce packages that make using R increasingly easier and more user-friendly [e.g., the easystats ecosystem @easystatsPackage]. The {rempsyc} package (Really Easy Methods for Psychology) contributes to this momentum by providing convenience functions that remove as much friction as possible between your script and your manuscript (in particular, if you are using Microsoft Word).
There are mainly three things that go into a manuscript: text, tables, and figures. {rempsyc} does not generate publication-ready text summarizing analyses; for this, see the {report} package [@reportPackage]. Instead, {rempsyc} focuses on the production of publication-ready tables and figures. Below, I go over a few quick examples of those.
Many researchers using R still copy-paste the values from the R console to their manuscript, or retype them manually. Yet, this approach increases the risks of copy-paste and retyping errors so common in psychology. This problem is not trivial given that according to some estimates, up to 50% of articles in psychology have at least one statistical error [@nuijten2016prevalence]. Ideally, one should be able to format the table directly in R, and to export it to Word directly.
Formatting a table properly in R is already a tedious and time-consuming task, but fortunately several packages take care of this step [e.g., the {broom} or {report} packages, @reportPackage; @broom2022, and there are several others]. Exporting these formatted tables to Microsoft Word remains a challenge however. Some packages do export to Word [e.g., the {apaTables} package @stanley2018reproducible], but their formatting is often rigid especially when using analyzes or table formats that are not supported by default.
{rempsyc} solves this problem by allowing maximum flexibility: you
manually create the data frame exactly the way you want, and then only
use the nice_table()
function on the resulting data frame.
nice_table()
works on any data frame, even non-statistical ones like
mtcars
.
One of its main benefits however is the automatic formatting of
statistical symbols and its integration with other packages. We can for
example create a {broom} table and then apply nice_table()
on it. It
suits particularly well the pipe workflow.
library(rempsyc)
lm(mpg ~ cyl + wt * hp, mtcars) |>
broom::tidy(conf.int = TRUE) |>
nice_table(broom = "lm")
{width=60%}
We can do the same with a {report} table.
stats.table <- lm(mpg ~ cyl + wt * hp, mtcars) |>
report::report() |>
as.data.frame()
nice_table(stats.table)
{width=80%}
The {report} package provides quite comprehensive tables, so one may
request an abbreviated table with the 'short'
argument. For
convenience, it is also possible to highlight significant results for
better visual discrimination, using the 'highlight'
argument.^[This argument
can be used logically, as 'TRUE'
or 'FALSE'
, but can also be provided with a numeric value representing the
cut-off threshold for the p value] Once
satisfied with the table, we can add a title and note.
my_table <- nice_table(
stats.table, short = TRUE, highlight = 0.001,
title = c("Table 1", "A Pretty Regression Model"),
note = c("The data was extracted from the 1974 Motor Trend US magazine.",
"Greyed rows represent statistically significant differences, p < .001."))
my_table
{width=80%}
One can then easily save the resulting table to Word with
flextable::save_as_docx()
, specifying the object name and desired
path.
flextable::save_as_docx(my_table, path = "my_table.docx")
Additionally, tables created with nice_table()
are {flextable} objects
[@flextable2022], and can be modified as
such.^[A great resource for this is
the {flextable} e-book: https://ardata-fr.github.io/flextable-book/]
{rempsyc} also provides its own set of functions to prepare statistical
tables before they can be fed to nice_table()
and saved to Word.
nice_t_test(data = mtcars,
response = c("mpg", "disp", "drat"),
group = "am",
warning = FALSE) |>
nice_table()
{width=70%}
nice_contrasts(data = mtcars,
response = c("mpg", "disp"),
group = "cyl",
covariates = "hp") |>
nice_table(highlight = .001)
{width=80%}
data <- lapply(mtcars, scale)
model1 <- lm(mpg ~ disp + wt * hp, data)
model2 <- lm(qsec ~ drat + wt * hp, data)
my.models <- list(model1, model2)
nice_lm(my.models) |>
nice_table(highlight = TRUE)
{width=80%}
nice_lm_slopes(my.models, predictor = "wt", moderator = "hp") |>
nice_table()
{width=80%}
It is also possible to export a colour-coded correlation matrix to
Microsoft Excel. The cormatrix_excel()
function has several benefits
over conventional approaches. The base R cor()
function for example
does not use rounded values and the console is impractical for large
matrices. One may manually round values and export it to a .csv
file,
which is an improvement but still unsatisfying.
The {apaTables} package [@stanley2018reproducible] allows exporting the correlation matrix to Word in an APA format, and in many cases this already meets the formal requirements of APA style. However, the Word format is not suitable for large matrices, as it will often spread beyond the document’s margin limits.
Another approach is to export the matrix to an image, like the {correlation} package does [@correlationpackage].^[Exporting the correlation matrix to an image through the {correlation} package also requires the {see} package [@seepackage]] For very small matrices, this works extremely well, and the colour is an immense help to quickly identify which correlations are strong or weak, positive or negative, and significant or non-significant. Again, however, this does not work so well for large matrices because labels might overlap or navigating the large figure becomes difficult.
When the goal is more exploratory in nature, and one has large matrices, it can be beneficial to export them to Excel. {rempsyc} combines the idea of using a coloured correlation matrix from the {correlation} package with the idea of exporting to Excel using {openxlsx2} [@openxlsx2package].
{rempsyc} also provides some usability improvements, like freezing the first row and column so as to be able to easily see which variables correlate with which other variables, regardless of how far or deep those variables are located within the matrix.
The colour represents the strength of the correlation, whereas the stars represent different significance thresholds for the p value is.^[For convenience, colours are only used when the corresponding p value is at least smaller than .05] The exact p values are provided in a second tab for reference purposes, so all information is readily available in just one function call.
cormatrix_excel(data = infert,
filename = "cormatrix1",
select = c("age", "parity", "induced", "case", "spontaneous",
"stratum", "pooled.stratum"))
Preparing figures according to APA style, having them look good, and being able to save them in high-resolution with the proper ratios is often challenging. Working with {ggplot2} [@ggplot2package] provides tremendous flexibility, but an unintended consequence is that doing even trivial operations can at times be daunting.
This is why {rempsyc} setups a few default plot types, ready to be saved
to your preferred format (.pdf
, .tiff
, or .png
).
nice_violin(data = ToothGrowth,
group = "dose",
response = "len",
xlabels = c("Low", "Medium", "High"),
comp1 = 1,
comp2 = 3,
has.d = TRUE,
d.y = 30)
{width=60%}
For an example of such use in publication, see @theriault2021swapping.
One can easily save the resulting figure with ggplot2::ggsave()
,
specifying the desired file name, extension, and resolution.
ggplot2::ggsave('nice_violinplothere.pdf', width = 7, height = 7,
unit = 'in', dpi = 300)
Recommended dimensions for saving {rempsyc} figures is 7 inches wide and
7 inches high at 300 dpi, which makes sure that the resolution is high
enough even if saving to non-vector graphics formats like .png
. That
said, scalable vector graphics formats like .pdf
or .eps
are still
recommended for high-resolution submissions to scientific journals.
Figures are {ggplot2} objects [@ggplot2package], and can be modified as such.
nice_scatter(data = mtcars,
predictor = "wt",
response = "mpg",
group = "cyl",
has.confband = TRUE)
{width=60%}
nice_scatter(data = mtcars,
predictor = "wt",
response = "mpg",
has.confband = TRUE,
has.r = TRUE,
has.p = TRUE) +
ggplot2::geom_hline(yintercept = mean(mtcars$mpg), colour = "black",
linewidth = 1.4, linetype = "dashed") +
ggplot2::annotate("text", x = 3.5, y = 22, size = 7,
label = paste("Mean mpg =", round(mean(mtcars$mpg), 2)))
{width=60%}
For an example of such use in publication, see @krol2020self.
For psychologists using the Inclusion of Other in the the Self Scale
[@aron1992inclusion], it can be useful to
interpolate the original discrete scores (1 to 7) into a group average
representation of the conceptual self-other overlap. For example,
assuming the group mean is 3.5 on the 1 to 7 scale, overlap_circle()
will draw a 25% area overlap from interpolation:
overlap_circle(3.5)
{width=40%}
For an example of such use in publication, see @theriault2021swapping.
When comes time to test assumptions of a linear model, the best option
is the check_model()
function from easystats’ {performance} package,
which allows direct visual evaluation of assumptions [@performancepackage].
Indeed, visual
assessment of diagnostic plots is recommended over statistical tests
since they are overpowered in large samples and underpowered in small
samples [@kozak2018s].
That said, if for whatever reason one wants to check objective asumption
tests for a linear model, rempsyc makes this easy with the
nice_assumptions()
function, which provide p values for normality
(Shapiro–Wilk), homoscedasticity (Breusch–Pagan) and autocorrelation of
residuals (Durbin–Watson) in one call.
nice_normality()
makes it easy to visually check normality in the case
of categorical predictors (i.e., when using groups), through a
combination of quantile-quantile plots, density plots, and histograms.
nice_normality(data = iris,
variable = "Sepal.Length",
group = "Species",
shapiro = TRUE,
histogram = TRUE,
title = "Density (Sepal Length)")
Similarly for univariate outliers using median absolute deviations from the median [MAD, @leys2013outliers].
plot_outliers(airquality,
group = "Month",
response = "Ozone")
{width=60%}
Univariate outliers based on the median/MAD can also be simply requested
with find_mad()
.^[Once one has identified outliers, it is also possible to
winsorize them with the winsorize_mad()
function.]
find_mad(airquality, names(airquality), criteria = 3)
## 8 outlier(s) based on 3 median absolute deviations for variable(s):
## Ozone, Solar.R, Wind, Temp, Month, Day
##
## Outliers per variable:
##
## $Ozone
## Row Ozone_mad
## 1 30 3.218284
## 2 62 3.989131
## 3 99 3.488081
## 4 101 3.025573
## 5 117 5.261028
## 6 121 3.333911
##
## $Wind
## Row Wind_mad
## 1 9 3.049871
## 2 48 3.225825
Homoscedasticity can also be checked numerically with nice_var()
or
visually with nice_varplot()
.
nice_var(data = iris,
variable = names(iris[1:4]),
group = "Species") |>
nice_table()
nice_varplot(data = iris,
variable = "Sepal.Length",
group = "Species")
{width=70%}
Finally, with the idea of making the analysis workflow easier in mind,
{rempsyc} also provides a few other utility functions. nice_na()
allows reporting item-level missing values per scale, as well as
participant’s maximum number of missing items by scale, as per
recommendations [@parent2013handling].
extract_duplicates()
creates a data frame of only observations with a
duplicated ID or participant number, so they can be investigated more
thoroughly. best_duplicate()
allows to follow-up on this investigation
and only keep the “best” duplicate, meaning those with the fewer number
of missing values, and in case of ties, the first one.
nice_reverse()
permits the automatic reverse-coding of scores so
common for psychology questionnaires, provided the minimum and maximum
score values are known.
There are other functions that the reader can explore at their leisure on the package official website. However, hopefully, this overview has given the reader a gentle introduction to this package.
The {rempsyc} package is licensed under the GNU General Public License
(GPL v3.0). It is available on CRAN, and can be installed using
install.packages("rempsyc")
. The full tutorial website can be accessed
at: https://rempsyc.remi-theriault.com/. All code is open-source and
hosted on GitHub, and bugs can be reported at
https://github.com/rempsyc/rempsyc/issues/.
I would like to thank Jay Olson, Hugues Leduc, Björn Büdenbender, and Charles-Étienne Lavoie for statistical or technical advice that helped inform some functions of this package and/or useful feedback on this manuscript. I would also like to acknowledge funding from the Social Sciences and Humanities Research Council of Canada.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.