Generating Small, Medium, and Large Datasets

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Overview

This vignette demonstrates how to use the {samplezoo} package to generate datasets of varying sizes (small, medium, and large) with variables from multiple probability distributions.

Each dataset contains:

library(samplezoo)

Generate a small dataset (i.e., 100 rows)

data_small <- samplezoo("small")
head(data_small)

Generate a medium sized dataset (i.e., 1,000 rows)

data_medium <- samplezoo("medium")
head(data_medium)

Generate a large sized dataset (i.e., 10,000 rows)

data_large <- samplezoo("large")
head(data_large)

Adding Variation or Ensuring Reproducibility with set.seed()

To ensure reproducibility and introduce controlled variation in your dataset, use set.seed() before generating random data.

Reproducibility

set.seed(123)
data_large <- samplezoo("large")
head(data_large)
set.seed(123)
data_large <- samplezoo("large")
head(data_large)

Variation

set.seed(123)
data_large <- samplezoo("large")
head(data_large)
set.seed(456)
data_large <- samplezoo("large")
head(data_large)


Try the samplezoo package in your browser

Any scripts or data that you put into this service are public.

samplezoo documentation built on April 4, 2025, 1:16 a.m.