shuffle_bites: Shuffle bite data by sorting the intervals at individual...
In gvegayon/biteme:

Description Usage Arguments Details Value Examples

View source: R/shuffle.r

Shuffle bite data by sorting the intervals at individual level

shuffle_bites(data, ...)

## Default S3 method:
shuffle_bites(data, locations = NULL, ...)

## S3 method for class 'data.frame'
shuffle_bites(data, locations = NULL, coerce = TRUE, return.raw = FALSE, ...)

`data`	A two-column integer matrix. The first column holds the times, while the second column holds the individual ids.
`...`	Further arguments passed to the method.
`locations`	A list of length `unique(data[,2])` with each individual's observations row locations in `data`. For advance use only (see details).
`coerce`	Logical scalar. When `TRUE` it makes sure that the first two columns of the data are integer. You can skip this if you know that the columns are integer (runs faster).
`return.raw`	Logical scalar. When `TRUE` it returns the shuffled version of the data (first two columns only) (also runs faster).

When locations is not provided, the algorithm by default generates the locations list. The structure of such list should be 1 element per individual, and for each individual an integer vector with positions from 0:(nrow(data) - 1) (indexing from 0) of where their observations are located at. The advantage of using this parameter is that we avoid sorting and finding such positions each time that the algorithm is called, which can make it significantly faster

1	lapply(sort(unique(data [,2])) , function(x) which(data [,2] == x) - 1L)

A shuffled version of the data

# Checking unbiasedness -----------------------------------------------------

# In this example we permute bite data that has only 6 possible

# Function to encode the shuffle
shuffle_wrap <- function(dat) {
  paste0(shuffle_bites(dat)[,1], collapse="")
}

# Fake data. This has only 6 possible permutations
dat <- cbind(
  time = c(0, 1, 3, 6),
  ids  = rep(1, 4)
)

# Tabulating and plotting the permutations
n <- 5e4
set.seed(111224)
ans <- replicate(n, shuffle_wrap(dat))
ans <- table(ans)/n

ans
# ans
#    0136    0146    0236    0256    0346    0356
# 0.16842 0.16622 0.16618 0.16794 0.16566 0.16558

# Plotting the distribution
barplot(ans)