gen_structured_model: Simulation Scenario from Bhatnagar et al. (2018+) ggmix paper
In sahirbhatnagar/penfam: Variable Selection in Linear Mixed Models for SNP Data

Description Usage Arguments Details Value See Also Examples

Function that generates data of the different simulation studies presented in the accompanying paper. This function requires the popkin and bnpsd package to be installed.

gen_structured_model(
  n,
  p_design,
  p_kinship,
  k,
  s,
  Fst,
  b0,
  nPC = 10,
  eta,
  sigma2,
  geography = c("ind", "1d", "circ"),
  percent_causal,
  percent_overlap,
  train_tune_test = c(0.6, 0.2, 0.2)
)

`n`	number of observations to simulate
`p_design`	number of variables in X_test, i.e., the design matrix
`p_kinship`	number of variable in X_kinship, i.e., matrix used to calculate kinship
`k`	number of intermediate subpopulations.
`s`	the desired bias coefficient, which specifies sigma indirectly. Required if sigma is missing
`Fst`	The desired final FST of the admixed individuals. Required if sigma is missing
`b0`	the true intercept parameter
`nPC`	number of principal components to include in the design matrix used for regression adjustment for population structure via principal components. This matrix is used as the input in a standard lasso regression routine, where there are no random effects.
`eta`	the true eta parameter, which has to be `0 < eta < 1`
`sigma2`	the true sigma2 parameter
`geography`	the type of geography for simulation the kinship matrix. "ind" is independent populations where every individuals is actually unadmixed, "1d" is a 1D geography and "circ" is circular geography. Default: "ind". See the functions in the `bnpsd` for details on how this data is actually generated.
`percent_causal`	percentage of `p_design` that is causal. must be 0 ≤q percent_causal ≤q 1. The true regression coefficients are generated from a standard normal distribution.
`percent_overlap`	this represents the percentage of causal SNPs that will also be included in the calculation of the kinship matrix
`train_tune_test`	the proportion of sample size used for training tuning parameter selection and testing. default is 60/20/20 split

The kinship is estimated using the popkin function from the popkin package. This function will multiple that kinship matrix by 2 to give the expected covariance matrix which is subsequently used in the linear mixed models

A list with the following elements

ytrain: simulated response vector for training set
ytune: simulated response vector for tuning parameter selection set
ytest: simulated response vector for test set
xtrain: simulated design matrix for training set
xtune: simulated design matrix for tuning parameter selection set
xtest: simulated design matrix for testing set
xtrain_lasso: simulated design matrix for training set for lasso model. This is the same as xtrain, but also includes the nPC principal components
xtune_lasso: simulated design matrix for tuning parameter selection set for lasso model. This is the same as xtune, but also includes the nPC principal components
xtest: simulated design matrix for testing set for lasso model. This is the same as xtest, but also includes the nPC principal components
causal: character vector of the names of the causal SNPs
beta: the vector of true regression coefficients
kin_train: 2 times the estimated kinship for the training set individuals
kin_tune_train: The covariance matrix between the tuning set and the training set individuals
kin_test_train: The covariance matrix between the test set and training set individuals
Xkinship: the matrix of SNPs used to estimate the kinship matrix
not_causal: character vector of the non-causal SNPs
PC: the principal components for population structure adjustment

admix_prop_1d_linear

admixed <- gen_structured_model(n = 100,
                                p_design = 50,
                                p_kinship = 5e2,
                                geography = "1d",
                                percent_causal = 0.10,
                                percent_overlap = "100",
                                k = 5, s = 0.5, Fst = 0.1,
                                b0 = 0, nPC = 10,
                                eta = 0.1, sigma2 = 1,
                                train_tune_test = c(0.8, 0.1, 0.1))
names(admixed)