GendataLDA: Generate simulation data (Categorial based on linear...

View source: R/GendataLDA.R

GendataLDAR Documentation

Generate simulation data (Categorial based on linear discriminant analysis model)

Description

Simulates a dataset that can be used to filter out features for ultrahigh-dimensional discriminant analysis. The simulation is based on the balanced scenarios in Example 3.1 of Cui et al.(2015). The simulated dataset has p numerical X-predictors and a categorical Y-response.

Usage

GendataLDA(
  n,
  p,
  R = 3,
  error = c("gaussian", "t", "cauchy"),
  style = c("balanced", "unbalanced")
)

Arguments

n

Number of subjects in the dataset to be simulated. It will also equal to the number of rows in the dataset to be simulated, because it is assumed that each row represents a different independent and identically distributed subject.

p

Number of predictor variables (covariates) in the simulated dataset. These covariates will be the features screened by model-free procedures.

R

A positive integer, number of outcome categories for multinomial (categorical) outcome Y.

error

The distribution of error term, you can choose "gaussian" to generate a normal distribution of error or you choose "t" to generate a t distribution of error with degree=2. "cauchy" is represent the error term with cauchy distribution.

style

The balance among categories in categorial data .

Value

the list of your simulation data

Author(s)

Xuewei Cheng xwcheng@hunnu.edu.cn

References

Cui, H., Li, R., & Zhong, W. (2015). Model-free feature screening for ultrahigh dimensional discriminant analysis. Journal of the American Statistical Association, 110(510), 630-641.

Examples

n <- 100
p <- 200
R <- 3
data <- GendataLDA(n, p, R, error = "gaussian", style = "balanced")

MFSIS documentation built on June 22, 2024, 9:42 a.m.