knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library("papaja")
Here, we provide a brief overview of issues to consider when implementing a new method for apa_print()
, a convenience function to facilitate reporting of results in accordance with APA reporting guidelines.
If you consider adding a new method, please review our brief contributing guidelines and code of conduct.
If you are reporting the results of a statistical analysis that is not yet supported by apa_print()
you probably have a good motivation and possibly prior work to build on.
If you are just looking for a way to contribute to, take a look at the open issues for inspiration.
apa_print()
is a generic, meaning it can, in principle, work on any output object with a class that is specific enough to purposefully extract the results of the analysis.
For example, objects of class htest
, as returned by t.test()
, cor.test()
, prop.test()
, etc., are named lists that follow a loose convention about the named objects they contain.
t_test_example <- t.test(extra ~ group, data = sleep) class(t_test_example) str(t_test_example)
Hence, if we pass an htest
object to apa_print()
the function expects there to be named elements in the list, such as statistic
, estimate
, or p.value
.
These expectations are reflected in the workings of the apa_print.htest()
method.
Objects of less specific classes, such as list
or data.frame
cannot be supported, because we cannot make any useful assumptions about their structure.
Objects returned by apa_print()
are of class apa_results
, a named list with four elements:
papaja:::init_apa_results()
To illustrate how apa_results
objects are populated, let's look at the output of apa_print.lm()
.
# Data from Dobson (1990), p. 9. ctl <- c(4.17, 5.58, 5.18, 6.11, 4.50, 4.61, 5.17, 4.53, 5.33, 5.14) trt <- c(4.81, 4.17, 4.41, 3.59, 5.87, 3.83, 6.03, 4.89, 4.32, 4.69) group <- gl(2, 10, 20, labels = c("Ctl", "Trt")) weight <- c(ctl, trt) lm_fit <- lm(weight ~ group) lm_fit_apa <- apa_print(lm_fit)
The estimate
element of the returned apa_results
-list itself contains a named list with estimated parameters---in this case regression coefficients---and corresponding confidence intervals for the model.
The names of the list correspond to the names of the predictors.
lm_fit_apa$estimate
The estimate
list may contain additional elements, such as the list modelfit
, that contains quantitative estimates of the model fit.
The statistic
element of the returned apa_results
list contains a named list with the same structure as estimate
.
Instead of parameter estimates, statistic
contains the corresponding inferential test statistics, such as significance tests or Bayesian model comparisons.
lm_fit_apa$statistic
Note that the statistics
list misses elements for the information criteria AIC
and BIC
.
Because no inferential test statistics on the information criteria are available, it is fine to simply drop those elements.
The full_results
element is a named list that simply combines the results of estimate
and statistic
for convenience in reporting.
lm_fit_apa$full_result
Finally, the table
element contains a data.frame
of class apa_results_table
that summarizes the results.
In essence this is simply a regular data.frame
that follows the column-naming conventions used in broom but allows for prettier printing of variable labels.
lm_fit_apa$table
For more complex analyses table
may contain a named list of apa_result_table
s.
We use tinylabels to set variable labels.
These variable labels are attributes attached to each column and contain a typeset label for the respective column.
# library("tinylabels") letters variable_label(letters) <- "Letters of the alphabet" variable_label(letters) letters str(letters)
lm_fit_apa$table$statistic
Variable labels are automatically used by apa_table()
and plotting functions from the apa_factorial_plot()
-family to create sensible default labels.
If a label is enveloped in $
it may contain LaTeX math syntax, which is automatically converted to R expressions using latex2exp for plotting.
Any new apa_print()
method should output an object of this basic structure.
apa_results
do not contain numeric information.
Rather the numeric information has been processed for printing in accordance with APA guidelines.
There are several papaja-functions to facilitate the typesetting.
apa_num()
is a flexible general purpose function that wraps formatC()
and can be used to round, set decimal as well as thousands separators, or remove leading zeros.
x <- rnorm(3) * 1e4 apa_num(x) apa_num(x, digits = 3, big.mark = ".", decimal.mark = ",") apa_num(Inf)
apa_p()
is a wrapper for apa_num()
that sets appropriate defaults to report p values in accordance with APA guidelines.
apa_p(c(0.0001, 0.05, 0.99999))
The internal function apa_df()
is geared towards typesetting degrees of freedom.
apa_df(c(12, 12.485)) apa_df(12L)
Finally, apa_interval()
can be used to typeset interval estimates.
apa_interval(rnorm(2), conf.int = 0.95, interval_type = "CI")
Again, there are two wrappers that set appropriate defaults to typeset frequentist confidence intervals and Bayesian highest-density intervals.
apa_confint(rnorm(2), conf.int = 0.95) apa_hdint(rnorm(2), conf.int = 0.95)
When creating named lists from terms, these terms names should use _
as separator, and be valid R names.
Adhering to these conventions ensures that apa_results
can conveniently be indexed using the $
operator.
To facilitate the generation of list names, papaja provides the internal function sanitize_terms()
.
mod_terms <- c("(Intercept)", "Factor A", "Factor B", "Factor A:Factor B", "scale(Factor A)") sanitize_terms(mod_terms, standardized = TRUE)
While these sanitized terms are well suited to name R objects, they are not ideal for reporting.
To facilitate typesetting term names for reporting, there is another internal function beautify_terms()
.
beautify_terms(mod_terms, standardized = TRUE)
As with lm
objects, it is often the case that the objects, as returned by the analysis function, may not contain all information necessary to populate the lists described above.
For example, to obtain inferential statistics it may be necessary to call summary()
.
npk_aov <- aov(yield ~ block + N * P * K, npk) npk_aov summary(npk_aov)
This is why there are usually multiple apa_print()
-methods that are called subsequently to make the function both flexible and convenient.
For convenience, apa_print.aov()
calls summary()
with its default arguments and passes the result onto apa_print.summary.aov()
.
papaja:::apa_print.aov
This approach also ensures that a variety of object types are supported while minimizing code redundancy.
The internals of apa_print()
heavily rely on broom, a package to conveniently restructure the output of analysis functions into tidy data.frame
s.
The objects are often processed using broom::tidy()
, and broom::glance()
if necessary, before being modified further to create the contents of the table
element.
Once the results table has been assembled, numeric values have been typeset, and variable labels have been assigned glue_apa_results()
can be used to create an apa_results
object according to the above specifications.
Consider the following example of an lm
-object.
First we tidy()
and glance()
the object to obtain tidy results.
We than typeset all "special" numerical results, that is, all results that would not be typeset appropriately by applying apa_num()
with its default settings.
Moreover, we combine the separate columns for lower and upper confidence interval bounds into one column conf.int
which contains the complete confidence interval.
lm_fit <- lm(mpg ~ cyl + wt, mtcars) # Tidy and typeset output library("broom") tidy_lm_fit <- tidy(lm_fit, conf.int = TRUE) tidy_lm_fit$p.value <- apa_p(tidy_lm_fit$p.value) tidy_lm_fit$conf.int <- unlist(apa_confint(tidy_lm_fit[, c("conf.low", "conf.high")])) str(tidy_lm_fit) glance_lm_fit <- glance(lm_fit) glance_lm_fit$r.squared <- apa_num(glance_lm_fit$r.squared, gt1 = FALSE) glance_lm_fit$p.value <- apa_p(glance_lm_fit$p.value) glance_lm_fit$df <- apa_df(glance_lm_fit$df) glance_lm_fit$df.residual <- apa_df(glance_lm_fit$df.residual) str(glance_lm_fit)
Next, we typeset the remaining numeric columns and assign informative variable labels:
tidy_lm_fit <- apa_num(tidy_lm_fit) variable_labels(tidy_lm_fit) <- c( term = "Term" , estimate = "$b$" , statistic = paste0("$t(", glance_lm_fit$df.residual, ")") , p.value = "$p$" , conf.int = "95% CI" ) glance_lm_fit <- apa_num(glance_lm_fit) variable_labels(glance_lm_fit) <- c( r.squared = "$R^2$" , statistic = "$F$" , p.value = "$p$" , AIC = "$\\mathrm{AIC}$" )
Now we can use glue_apa_results()
to create the output object.
In doing so, we use the internal function construct_glue()
to automatically determine the correct "glue" of the reporting string.
Let's first examine the glue.
papaja:::construct_glue(tidy_lm_fit, "estimate")
The character string contains a combination of text and to-be-evaluated R code enveloped in <<
and >>
.
All variable names (e.g. estimate
) are assumed to be columns of x
(here tidy_lm_fit
) or any additional object passed to glue_apa_results()
via ...
.
svl()
is a function that returns a column variable label but, by default, remove the math-environment tags ($
) as these are not needed here.
lm_results <- glue_apa_results( x = tidy_lm_fit , est_glue = papaja:::construct_glue(tidy_lm_fit, "estimate") , stat_glue = papaja:::construct_glue(tidy_lm_fit, "statistic") , term_names = sanitize_terms(tidy_lm_fit$term) ) lm_results
If we need to add additional information to this output, we can use add_glue_to_apa_results()
.
This function takes an existing output and adds new strings to a specific sublist.
So, let's add some model fit information to the output.
add_glue_to_apa_results( .x = glance_lm_fit , container = lm_results , sublist = "modelfit" , est_glue = c( r2 = "$<<svl(r.squared)>> = <<r.squared>>$" , aic = "" ) , stat_glue = c( r2 = papaja:::construct_glue(glance_lm_fit, "statistic") , aic = "$<<svl(AIC)>> = <<AIC>>$" ) )
A final issue to consider is that users may pass inappropriate input to apa_print()
.
To ensure that we return correct output or informative error messages, we need input validation.
Currently, papaja relies on the internal function validate()
for this.
in_paren <- TRUE papaja:::validate(in_paren, check_class = "logical", check_length = 1)
Please use either validate()
or perform input validation using the assertthat package.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.