{rempsyc} is an R package of convenience functions that make the analysis-to-publication workflow faster, easier, and less error-prone. It enables the creation of publication-ready APA (American Psychological Association) tables exportable to Word (via {flextable}) and easily customizable APA-compliant plots (via {ggplot2}). It makes it easy to run statistical tests, check assumptions, and automate various tasks common in psychology research and social sciences more broadly.
There are many reasons to use R [@base2021] for analyzing and reporting data from research studies, such as being compatible with the ideals of open science [@quintana2020]. However, R has a major downside for novices: its steep learning curve due to its programmatic interface, in contrast to perhaps more user-friendly point-and-click software. Of course, this flexibility is also a strength, as the R community can and does come together to produce packages that make using R increasingly easier and more user-friendly [e.g., the easystats ecosystem @easystatsPackage]. The {rempsyc} package (Really Easy Methods for Psychology) contributes to this momentum by providing convenience functions that remove as much friction as possible between your script and your manuscript (in particular, if you are using Microsoft Word).
There are mainly three things that go into a manuscript: text, tables, and figures. {rempsyc} does not generate publication-ready text summarizing analyses; for this, see the {report} package [@reportPackage]. Instead, {rempsyc} focuses on the production of publication-ready tables and figures. Below, I go over a few quick examples of those.
Many researchers using R still copy-paste the values from the R console to their manuscript, or retype them manually. Yet, this approach increases the risks of copy-paste and retyping errors so common in psychology. This problem is not trivial given that according to some estimates, up to 50% of articles in psychology have at least one statistical error [@nuijten2016prevalence]. Ideally, one should be able to format the table directly in R, and to export it to Word directly.
Formatting a table properly in R is already a tedious and time-consuming task, but fortunately several packages take care of this step [e.g., the {broom} or {report} packages, @reportPackage; @broom2022, and there are several others]. Exporting these formatted tables to Microsoft Word remains a challenge however. Some packages do export to Word [e.g., the {apaTables} package @stanley2018reproducible], but their formatting is often rigid especially when using analyzes or table formats that are not supported by default.
{rempsyc} solves this problem by allowing maximum flexibility: you manually
create the data frame exactly the way you want, and then only use the
nice_table()
function on the resulting data frame. nice_table()
works on
any data frame, even non-statistical ones like mtcars
.
flextable::set_flextable_defaults(background.color = "white") library(rempsyc)
One of its main benefits however is the automatic formatting of statistical
symbols and its integration with other packages. We can for example create
a {broom} table and then apply nice_table()
on it. It suits particularly
well the pipe workflow.
library(rempsyc) lm(mpg ~ cyl + wt * hp, mtcars) |> broom::tidy(conf.int = TRUE) |> nice_table(broom = "lm")
{width=60%}
We can do the same with a {report} table.
stats.table <- lm(mpg ~ cyl + wt * hp, mtcars) |> report::report() |> as.data.frame() nice_table(stats.table)
{width=80%}
The {report} package provides quite comprehensive tables, so one may request an
abbreviated table with the 'short'
argument. For convenience, it is also
possible to highlight significant results for better visual discrimination,
using the 'highlight'
argument.^[This argument can be used logically, as 'TRUE'
or 'FALSE'
, but can also be provided with a numeric value representing the
cut-off threshold for the p value] Once satisfied with the table, we can add
a title and note.
my_table <- nice_table( stats.table, short = TRUE, highlight = 0.001, title = c("Table 1", "A Pretty Regression Model"), note = c("The data was extracted from the 1974 Motor Trend US magazine.", "Greyed rows represent statistically significant differences, p < .001.")) my_table
{width=80%}
One can then easily save the resulting table to Word with flextable::save_as_docx()
,
specifying the object name and desired path.
flextable::save_as_docx(my_table, path = "my_table.docx")
Additionally, tables created with nice_table()
are {flextable} objects
[@flextable2022], and can be modified as such.^[A great resource for this is
the {flextable} e-book: https://ardata-fr.github.io/flextable-book/]
{rempsyc} also provides its own set of functions to prepare statistical tables
before they can be fed to nice_table()
and saved to Word.
nice_t_test(data = mtcars, response = c("mpg", "disp", "drat"), group = "am", warning = FALSE) |> nice_table()
{width=60%}
nice_contrasts(data = mtcars, response = c("mpg", "disp"), group = "cyl", covariates = "hp") |> nice_table(highlight = .001)
{width=80%}
data <- lapply(mtcars, scale) model1 <- lm(mpg ~ disp + wt * hp, data) model2 <- lm(qsec ~ drat + wt * hp, data) my.models <- list(model1, model2) nice_lm(my.models) |> nice_table(highlight = TRUE)
{width=80%}
nice_lm_slopes(my.models, predictor = "wt", moderator = "hp") |> nice_table()
{width=80%}
It is also possible to export a colour-coded correlation matrix to Microsoft
Excel. The cormatrix_excel()
function has several benefits over conventional
approaches. The base R cor()
function for example does not use rounded values
and the console is impractical for large matrices. One may manually round values
and export it to a .csv
file, which is an improvement but still unsatisfying.
The {apaTables} package [@stanley2018reproducible] allows exporting the correlation matrix to Word in an APA format, and in many cases this already meets the formal requirements of APA style. However, the Word format is not suitable for large matrices, as it will often spread beyond the document's margin limits.
Another approach is to export the matrix to an image, like the {correlation} package does [@correlationpackage].^[Exporting the correlation matrix to an image through the {correlation} package also requires the {see} package [@seepackage]] For very small matrices, this works extremely well, and the colour is an immense help to quickly identify which correlations are strong or weak, positive or negative, and significant or non-significant. Again, however, this does not work so well for large matrices because labels might overlap or navigating the large figure becomes difficult.
When the goal is more exploratory in nature, and one has large matrices, it can be beneficial to export them to Excel. {rempsyc} combines the idea of using a coloured correlation matrix from the {correlation} package with the idea of exporting to Excel using {openxlsx2} [@openxlsx2package].
{rempsyc} also provides some usability improvements, like freezing the first row and column so as to be able to easily see which variables correlate with which other variables, regardless of how far or deep those variables are located within the matrix.
The colour represents the strength of the correlation, whereas the stars represent different significance thresholds for the p value is.^[For convenience, colours are only used when the corresponding p value is at least smaller than .05] The exact p values are provided in a second tab for reference purposes, so all information is readily available in just one function call.
cormatrix_excel(data = infert, filename = "cormatrix1", select = c("age", "parity", "induced", "case", "spontaneous", "stratum", "pooled.stratum"))
Preparing figures according to APA style, having them look good, and being able to save them in high-resolution with the proper ratios is often challenging. Working with {ggplot2} [@ggplot2package] provides tremendous flexibility, but an unintended consequence is that doing even trivial operations can at times be daunting.
This is why {rempsyc} setups a few default plot types, ready to be
saved to your preferred format (.pdf
, .tiff
, or .png
).
nice_violin(data = ToothGrowth, group = "dose", response = "len", xlabels = c("Low", "Medium", "High"), comp1 = 1, comp2 = 3, has.d = TRUE, d.y = 30)
For an example of such use in publication, see @theriault2021swapping.
One can easily save the resulting figure with ggplot2::ggsave()
, specifying
the desired file name, extension, and resolution.
ggplot2::ggsave('nice_violinplothere.pdf', width = 7, height = 7, unit = 'in', dpi = 300)
Recommended dimensions for saving {rempsyc} figures is 7 inches wide and 7
inches high at 300 dpi, which makes sure that the resolution is high enough even
if saving to non-vector graphics formats like .png
. That said, scalable vector
graphics formats like .pdf
or .eps
are still recommended for high-resolution
submissions to scientific journals.
Figures are {ggplot2} objects [@ggplot2package], and can be modified as such.
nice_scatter(data = mtcars, predictor = "wt", response = "mpg", group = "cyl", has.confband = TRUE) nice_scatter(data = mtcars, predictor = "wt", response = "mpg", has.confband = TRUE, has.r = TRUE, has.p = TRUE) + ggplot2::geom_hline(yintercept = mean(mtcars$mpg), colour = "black", linewidth = 1.4, linetype = "dashed") + ggplot2::annotate("text", x = 3.5, y = 22, size = 7, label = paste("Mean mpg =", round(mean(mtcars$mpg), 2)))
For an example of such use in publication, see @krol2020self.
For psychologists using the Inclusion of Other in the the Self Scale
[@aron1992inclusion], it can be useful to interpolate the original discrete
scores (1 to 7) into a group average representation of the conceptual self-other
overlap. For example, assuming the group mean is 3.5 on the 1 to 7 scale, overlap_circle()
will draw a 25% area overlap from interpolation:
overlap_circle(3.5)
For an example of such use in publication, see @theriault2021swapping.
When comes time to test assumptions of a linear model, the best option is
the check_model()
function from easystats' {performance} package, which
allows direct visual evaluation of assumptions [@performancepackage]. Indeed,
visual assessment of diagnostic plots is recommended over statistical tests
since they are overpowered in large samples and underpowered in small samples
[@kozak2018s].
That said, if for whatever reason one wants to check objective asumption tests
for a linear model, rempsyc makes this easy with the nice_assumptions()
function, which provide p values for normality (Shapiro--Wilk), homoscedasticity
(Breusch--Pagan) and autocorrelation of residuals (Durbin--Watson) in one call.
nice_normality()
makes it easy to visually check normality in the case of
categorical predictors (i.e., when using groups), through a combination of
quantile-quantile plots, density plots, and histograms.
nice_normality(data = iris, variable = "Sepal.Length", group = "Species", shapiro = TRUE, histogram = TRUE, title = "Density (Sepal Length)")
Similarly for univariate outliers using median absolute deviations from the median [MAD, @leys2013outliers].
plot_outliers(airquality, group = "Month", response = "Ozone")
Univariate outliers based on the median/MAD can also be simply requested with
find_mad()
.^[Once one has identified outliers, it is also possible to
winsorize them with the winsorize_mad()
function.]
find_mad(airquality, names(airquality), criteria = 3)
Homoscedasticity can also be checked numerically with nice_var()
or visually
with nice_varplot()
.
nice_var(data = iris, variable = names(iris[1:4]), group = "Species") |> nice_table()
nice_varplot(data = iris, variable = "Sepal.Length", group = "Species")
Finally, with the idea of making the analysis workflow easier in mind,
{rempsyc} also provides a few other utility functions. nice_na()
allows
reporting item-level missing values per scale, as well as participant’s maximum
number of missing items by scale, as per recommendations [@parent2013handling].
extract_duplicates()
creates a data frame of only observations with a
duplicated ID or participant number, so they can be investigated more
thoroughly. best_duplicate()
allows to follow-up on this investigation and
only keep the "best" duplicate, meaning those with the fewer number of missing
values, and in case of ties, the first one.
nice_reverse()
permits the automatic reverse-coding of scores so common for
psychology questionnaires, provided the minimum and maximum score values are
known.
There are other functions that the reader can explore at their leisure on the package official website. However, hopefully, this overview has given the reader a gentle introduction to this package.
The {rempsyc} package is licensed under the GNU General Public License (GPL
v3.0). It is available on CRAN, and can be installed using
install.packages("rempsyc")
. The full tutorial website can be accessed at:
https://rempsyc.remi-theriault.com/. All code is open-source and hosted on
GitHub, and bugs can be reported at https://github.com/rempsyc/rempsyc/issues/.
I would like to thank Jay Olson, Hugues Leduc, Björn Büdenbender, and Charles-Étienne Lavoie for statistical or technical advice that helped inform some functions of this package and/or useful feedback on this manuscript. I would also like to acknowledge funding from the Social Sciences and Humanities Research Council of Canada.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.