permutations: Permutation sampling
In tidymodels/rsample: General Resampling Infrastructure

permutations

R Documentation

Permutation sampling

Description

A permutation sample is the same size as the original data set and is made by permuting/shuffling one or more columns. This results in analysis samples where some columns are in their original order and some columns are permuted to a random order. Unlike other sampling functions in rsample, there is no assessment set and calling assessment() on a permutation split will throw an error.

Usage

permutations(data, permute = NULL, times = 25, apparent = FALSE, ...)

Arguments

`data`	A data frame.
`permute`	One or more columns to shuffle. This argument supports tidyselect selectors. Multiple expressions can be combined with `c()`. Variable names can be used as if they were positions in the data frame, so expressions like `x:y` can be used to select a range of variables. See `language` for more details.
`times`	The number of permutation samples.
`apparent`	A logical. Should an extra resample be added where the analysis is the standard data set.
`...`	These dots are for future extensions and must be empty.

Details

The argument apparent enables the option of an additional "resample" where the analysis data set is the same as the original data set. Permutation-based resampling can be especially helpful for computing a statistic under the null hypothesis (e.g. t-statistic). This forms the basis of a permutation test, which computes a test statistic under all possible permutations of the data.

Value

A tibble with classes permutations, rset, tbl_df, tbl, and data.frame. The results include a column for the data split objects and a column called id that has a character string with the resample identifier.

Examples

permutations(mtcars, mpg, times = 2)
permutations(mtcars, mpg, times = 2, apparent = TRUE)

library(purrr)
resample1 <- permutations(mtcars, starts_with("c"), times = 1)
resample1$splits[[1]] |> analysis()

resample2 <- permutations(mtcars, hp, times = 10, apparent = TRUE)
map_dbl(resample2$splits, function(x) {
  t.test(hp ~ vs, data = analysis(x))$statistic
})

tidymodels/rsample documentation built on June 12, 2025, 9:40 p.m.