models: Generate data from Gaussian, logistic and Poisson models.

View source: R/models.R

modelsR Documentation

Generate data from Gaussian, logistic and Poisson models.

Description

Generate data from Gaussian, logistic and Poisson models used in the simulation part of Tian, Y., & Feng, Y. (2023).

Usage

models(
  family = c("gaussian", "binomial", "poisson"),
  type = c("all", "source", "target"),
  cov.type = 1,
  h = 5,
  K = 5,
  n.target = 200,
  n.source = rep(100, K),
  s = 5,
  p = 500,
  Ka = K
)

Arguments

family

response type. Can be "gaussian", "binomial" or "poisson". Default = "gaussian".

  • "gaussian": Gaussian distribution.

  • "binomial": logistic distribution. When family = "binomial", the input response in both target and source should be 0/1.

  • "poisson": poisson distribution. When family = "poisson", the input response in both target and source should be non-negative.

type

the type of generated data. Can be "all", "source" or "target".

  • "all": generate a list with a target data set of size n.target and K source data set of size n.source.

  • "source": generate a list with K source data set of size n.source.

  • "target": generate a list with a target data set of size n.target.

cov.type

the type of covariates. Can be 1 or 2 (numerical). If it equals to 1, the predictors will be generated from the distribution used in Section 4.1.1 (Ah-Trans-GLM) in the latest version of Tian, Y., & Feng, Y. (2023). If it equals to 2, the predictors will be generated from the distribution used in Section 4.1.2 (When transferable sources are unknown).

h

measures the deviation (l_1-norm) of transferable source coefficient from the target coefficient. Default = 5.

K

the number of source data sets. Default = 5.

n.target

the sample size of target data. Should be a positive integer. Default = 100.

n.source

the sample size of each source data. Should be a vector of length K. Default is a K-vector with all elements 150.

s

how many components in the target coefficient are non-zero, which controls the sparsity of target problem. Default = 15.

p

the dimension of data. Default = 1000.

Ka

the number of transferable sources. Should be an integer between 0 and K. Default = K.

Value

a list of data sets which depend on the value of type.

  • type = "all": a list of two components named "target" and "source" storing the target and source data, respectively. Component source is a list containing K components with the first Ka ones h-transferable and the remaining ones h-nontransferable. The target data set and each source data set have components "x" and "y", as the predictors and responses, respectively.

  • type = "source": a list with a signle component "source". This component contains a list of K components with the first Ka ones h-transferable and the remaining ones h-nontransferable. Each source data set has components "x" and "y", as the predictors and responses, respectively.

  • type = "target": a list with a signle component "target". This component contains another list with components "x" and "y", as the predictors and responses of target data, respectively.

References

Tian, Y., & Feng, Y. (2023). Transfer learning under high-dimensional generalized linear models. Journal of the American Statistical Association, 118(544), 2684-2697.

See Also

glmtrans.

Examples

set.seed(0, kind = "L'Ecuyer-CMRG")

D.all <- models("binomial", type = "all")
D.target <- models("binomial", type = "target")
D.source <- models("binomial", type = "source")


glmtrans documentation built on April 4, 2025, 12:32 a.m.