fit.risk.model.par: Resample families based on the risk model

Description Usage Arguments Value Examples

View source: R/TriadSim_functions.R

Description

This function selects families based on the prespecified risk model. It can simulate a homogenous scenario or a stratified scenario with two subpopulations. When e.fr is given rather than the default NA the risk model can involve exposure main effects as well as gene by exposure interation. This function is parallelized and the default number of cores for parallelization is set as the ceiling of half of the total number of CPU cores.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
fit.risk.model.par(
  n.ped,
  brks,
  target.snp,
  fam.pos,
  mom.tar,
  dad.tar,
  kid.tar,
  pathways,
  betas.e0,
  e.fr = NA,
  betas.e,
  pop1.frac = NA,
  rate.beta = NA,
  is.case = TRUE,
  qtl = FALSE,
  out.put.file = NA,
  no_cores = NA
)

Arguments

n.ped

is an integer giving the number of trios to be simulated

brks

a matrix of integers showing where the chromosomal breaks is to take place for each individual in the simulated trios.

target.snp

is a vector of integers showing the row number of the target SNPs in the .bim file.

fam.pos

is a matrix showing the chromosomal segments out of which is each target SNP selected for each simulated trio.

mom.tar

is a matrix containing genotypes of the target SNPs in the mothers of the original data for simulations of a homogenous population. For simulations under population stratification it is a list of two matrices each containing genotypes of the mothers' target SNP genotypes in one of the two subpopulations.

dad.tar

is a matrix containing the genotypes of the target SNPs in the fathers of the original data for simulations of a homogenous population. For simulations under population stratification it is a list of two matrices each containing fathers' target SNP genotypes in one of the two subpopulations.

kid.tar

is a matrix with containing genotypes of the target SNP in the children stacking on top of the complements of the original data for simulations of a homogenous population. For simulations under population stratification it is a list of two matrices each containing children's and complements' target SNP genotypes in one of the two subpopulations.

pathways

is a list of vectors of integers. Each vector of integers denotes the SNPs involved in a particular pathway. E.g. list(1:4,5:8) denote that there are two pathways. SNPs 1-4 are in the first pathway and SNPs 5-8 are in the second.

betas.e0

is a vector of doubles giving the beta coefficients of the logit risk model for the unexposed individuals. The length of the vector should be 1+ number_of_risk_pathway. The first number is a function of the disease prevalence in the unexposed individual who does not carry any copies of the risk pathway. The numbers after that gives the odds ratios for carrying one/two copies of the risk pathways comparing to those who do not carry any copies of the pathways in the unxposed group. e.g., c(-6.4, 0.5,1) means the baseline disease prevalence is exp(-6.4)/(1+exp(-6.4)) and the log OR for carrying at least one copy of the first pathway is 0.5 and that for carrying at least one copy of the second pathway is 1.

e.fr

is a double number between 0 and 1 which gives the exposure prevalence.

betas.e

is a vector of doubles giving the beta coefficients of the logit risk model for the exposed individuals. The length of the vector should be 1+ number_of_risk_pathway. The first number is a function the disease prevalence in the exposed individual who does not carry any copies of the risk pathway. The numbers after that gives the odds ratios for carrying one/two copies of the risk pathways comparing to those who do not carry any copies of the pathways in the exposed group.

pop1.frac

is a double number between 0 and 1 which gives the fraction of subpopulation 1 out of the two subpopulations for a population stratification scenario.

rate.beta

is a double number giving the log OR of disease prevalence in population 2 over that in population 1.

is.case

is a boolean variable. When is.case = TRUE case-parents trios will be simulated.Otherwise, control-parents trios will be simulated.

qtl

is a boolean variable denoting whether a quantitative trait (qtl=TRUE) or a binary trait (qtl=FALSE) is to be simulated. For a binary trait only affected families will be kept. The default value is qtl=FALSE.

out.put.file

is a character string giving the base file name for the output file. When a non-default value is given the fucntion will write the following files to the designated directory: a file with name ending with "exp.txt" containing the exposure data when exposure is involved in the risk model. a file with name ending with "pop.txt" containing information on subpopulation membership when the simulation involves a stratified scenario. a file with name ending with "pheno.tx" containing quantitative trait phenotype when a quantitative trait is involved. When out.put.file is the default value NA the file names for the above three files are: exposure.txt, population.txt, phenotype.txt.

no_cores

is an integer which specifies the number of CPU cores to be parallelized.

Value

The function returns a list of five elements. The first one is a matrix of integers giving the families (in terms of row number) selected for each simulated trio and each chromosomal segment. The second one is a matrix giving the genotypes on the target SNPs in the simulated trio. The third one is relevant only when exposure is involved. It is a vector of 0's and 1's giving the exposure status of each simulated trio when the risk model involves exposure. The fourth element is relevant only in simulations of stratified scenarios. It is a vector of 1's and 2's giving the memebership of the subpopulation groups of each simulated trio. The fifth element is relevant only in simulations of a quantitative trait. It is a vector of doubles giving the phenotype values for simulations of a quantitative trait.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
tar.snp <- c(21, 118, 121, 140, 155, 168, 218, 383) 
found.brks <- get.brks(N.brk=3,n.ped=1000, snp.all2, tar.snp,rcmb.rate=NA)
breaks <- found.brks[[1]]
family.position <- found.brks[[2]] 
betas <- c(-6.4, 3.2, 5.8)
pwy <- list(1:4,5:8)
m.file <- file.path(system.file(package = "TriadSim"),'extdata/pop1_4chr_mom')
f.file <- file.path(system.file(package = "TriadSim"),'extdata/pop1_4chr_dad')
k.file <- file.path(system.file(package = "TriadSim"),'extdata/pop1_4chr_kid')
# the preloaded data frame snp.all2 contains the data frame read from the corresponding .bim file.
target.geno <- get.target.geno(c(m.file,f.file,k.file), tar.snp,snp.all2)
mom.target <- target.geno[[1]]
dad.target <- target.geno[[2]]
kid.target <- target.geno[[3]]
## Not run:  
fitted.model <- fit.risk.model.par(n.ped=1000,brks=breaks,target.snp=tar.snp, 
fam.pos=family.position,mom.tar=mom.target,dad.tar=dad.target, kid.tar=kid.target,  
pathways=pwy,betas, e.fr=NA, betas,pop1.frac= NA,rate.beta=NA,no_cores=2)

## End(Not run)

TriadSim documentation built on Sept. 9, 2021, 1:06 a.m.