generate_X_cat: Simulate a normal data set S = (X, X_{cat}) that includes...

Description Usage Arguments Value Examples

View source: R/generate_normal.R

Description

Creates a toy data set S = (X, X_{cat}) where the columns of X are sampled from an independent Gaussian distribution with mean μ_i and standard deviation σ_i, i.e. N(μ_i, σ_i^2), and the columns of X_{cat} are categorical, sampled with replacement from a given number of categories (indexed by integers). The final dimension will be n \times (p_1 + p_2), where p_1 is the number of columns in X and p_2 is the number of columns in X_{cat}, with the number of data points n to be specified.

Usage

1
2
3
4
5
6
generate_X_cat(
  n = 100,
  mu = rep(0, 10),
  sigma = rep(1, 10),
  no_of_cat = c(4, 5)
)

Arguments

n

The desired number of data points in the data set.

mu

A p_1-dimensional vector of means for μ.

sigma

A p_1-dimensional vector of non-negative standard deviations for σ.

no_of_cat

A p_2-dimensional vector where the entries indicate the number of categories desired for each column of X_{cat}.

Value

An n \times (p_1 + p_2) dimensional data frame given by S = (X, X_{cat}). In the default case, the columns of X are sampled from N(0,1), n = 100 and p_1 = 10, p_2 = 2, i.e. two additional categorical columns of X_{cat} are added. The columns of X_{cat} are factors.

Examples

1
2
3
generate_X_cat()

generate_X_cat(n = 40, mu = 1:6, sigma = rep(1, 6), no_of_cat = c(2,3,5))

lamke07/stat545lamke07 documentation built on Dec. 21, 2021, 8:49 a.m.