knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )

The `summarize`

function in `dplyr`

, especially when combined with `group_by`

and `across`

, provides powerful tools for exploring data using summary statistics.
The `psyntur`

package provides some wrappers to these tools to allow data exploration, albeit of a limited kind, to be done quickly and easily.
We explore some of these functions in this vignette.

Load the `psyntur`

functions and data sets with the usual `library`

command.

```
library(psyntur)
```

`describe`

We can use the `describe`

function in `psyntur`

.
The first argument to `describe`

should be the data frame.
Subsequent arguments should be named arguments of summary statistics functions, like `mean`

, `median`

, etc., applied to any variables in the data frame.
For example, using the `faithfulfaces`

data frame, we can obtain the arithmetic mean and standard deviation of the `faithful`

variable as follows.

describe(data = faithfulfaces, avg = mean(faithful), stdev = sd(faithful))

We can apply the same or different functions to the same or different variables.

describe(data = faithfulfaces, avg_faith = mean(faithful), avg_trust = mean(trustworthy), sd_trust = sd(trustworthy))

We can obtain the summary statistics for the chosen variables for each group of a third variable using a `by`

variable.

describe(data = faithfulfaces, by = face_sex, avg = mean(faithful), stdev = sd(faithful))

The `by`

argument may be a vector of variables.
In this case, the chosen variables are grouped by the combination of the `by`

variables.
For example, in the following we group the `time`

variable in `vizverb`

by both `task`

and `response`

.

describe(vizverb, by = c(task, response), avg = mean(time), median = median(time), iqr = IQR(time), stdev = sd(time) )

It would be tedious and repetitive to use `describe`

as above if wanted to apply the same set of summary statistic functions to a set of variables.
Instead, we can use `describe_across`

.
For example, to calculate the mean, median, standard deviation to two variables, `trustworthy`

and `faithful`

, in the `faithfulfaces`

data set, we can do the following.

describe_across(faithfulfaces, variables = c(trustworthy, faithful), functions = list(avg = mean, median = median, stdev = sd) )

Note that the data frame that is returned is in a wide format.
We can pivot this to a longer format by saying `pivot = TRUE`

.

describe_across(faithfulfaces, variables = c(trustworthy, faithful), functions = list(avg = mean, median = median, stdev = sd), pivot = TRUE )

We can use the `by`

variable to calculate the summary statistics for each subgroup corresponding to each value of the `by`

variable, as in the following example.

describe_across(faithfulfaces, variables = c(trustworthy, faithful), functions = list(avg = mean, median = median, stdev = sd), by = face_sex, pivot = TRUE )

As in the case of `describe`

, the `by`

argument can be a vector of variables.

`_xna`

When variable have `NA`

values, most summary statistics function will, by default, return `NA`

.
To illustrate this, we can modify `faithfulfaces`

to contain `NA`

's for the `faithful`

variable.

faithfulfaces_na <- faithfulfaces %>% dplyr::mutate(faithful = ifelse(faithful > 6, NA, faithful))

Now, if we try one of the above `describe`

or `describe_aross`

functions with the `faithful`

variable, we will obtain corresponding `NA`

values.

describe_across(faithfulfaces_na, variables = c(trustworthy, faithful), functions = list(avg = mean, median = median, stdev = sd), by = face_sex, pivot = TRUE )

Of course, if we set `na.rm = TRUE`

in any or all of the summary functions, we will remove the `NA`

values before the statistics are calculated.
This is relatively easy to do with `describe`

, as in the following example.

describe(data = faithfulfaces, by = face_sex, avg = mean(faithful, na.rm = T), stdev = sd(faithful, na.rm = T))

However, for `describe`

across, we pass in a list of functions, and so to set `na.rm = T`

, we can to create `purrr`

style anonymous functions calling the summary statistic function with `na.rm = T`

, as in the following example.

library(purrr) describe_across(faithfulfaces_na, variables = c(trustworthy, faithful), functions = list(avg = ~mean(., na.rm = T), median = ~median(., na.rm = T), stdev = ~sd(., na.rm = T)), by = face_sex, pivot = TRUE )

Anonymous function like this are not very transparent for those new to R, and the resulting function looks quite complex.

In order to avoid using code like `~mean(., na.rm = T)`

, for a number of commonly used summary statistic functions (`sum`

, `mean`

, `median`

, `var`

, `sd`

, `IQR`

), we have made counterparts where `na.rm`

is set to `TRUE`

by default.
These functions have the same name as the original with the suffix `_xna`

(but `IQR`

is `iqr_xna`

, not `IQR_xna`

).
As such, we can do the following.

describe_across(faithfulfaces_na, variables = c(trustworthy, faithful), functions = list(avg = mean_xna, median = median_xna, stdev = sd_xna), by = face_sex, pivot = TRUE )

**Any scripts or data that you put into this service are public.**

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.