R/TriadSim_functions.R
In TriadSim: Simulating Triad Genomewide Genotypes

Documented in fit.risk.model.par get.brks get.target.geno glue.chr.segment.par pick_target.snp TriadSim

#' Pick target SNPs in the pathway
#' 
#' 
#' The target SNPs in the pathway can be picked by users manually or use this facility function.
#' It helps pick the set of target SNPs in the pathway(s) based on a 
#' desired allele frequency. If picked manually, the target SNPs should be in the order from the smallest to the largest.
#' @param input.plink.file is a vector of two character strings for the file names of the mother's
#' and  father's plink base filenames with the necessary path to the directory. The plink files are in bed format and three files with 
#' extensions .bed .bim and .fam are expected for each parent's genotypes. In addition the allele 
#' frequnecy files generated by PLINK (base filename with .frq extension) are expected to be in the
#' same directory as the .bed file.
#' @param fr.desire is a double number giving the desired frequency of the target SNPs. 
#' @param n.snp is an integer giving the number of target SNPs to be picked.
#'
#' @return The function returns a list of two: first element is  the SNPs read from the .bim file now with 
#' allele frequncies merged and the second is the row numbers of the target SNPs selected
#' among all SNPs in the .bim file.
#' @export
#' @importFrom  utils read.table 
#' @examples 
#' m.file <- file.path(system.file(package = "TriadSim"),'extdata/pop1_4chr_mom')
#' f.file <- file.path(system.file(package = "TriadSim"),'extdata/pop1_4chr_dad')
#' picked.target <- pick_target.snp(c(m.file,f.file),0.05, 8)
#' cat('Target SNPs picked:',picked.target[[2]],'\n')
pick_target.snp <- function(input.plink.file, fr.desire="double", n.snp="integer") {

    if (!file.exists(paste(input.plink.file[1],'frq',sep='.'))) stop(paste(input.plink.file[1],'.frq',
                                                                         " is needed for selecting target SNPs",sep=''))
  if (!file.exists(paste(input.plink.file[2],'frq',sep='.'))) stop(paste(input.plink.file[2],'.frq',
                                                                         " is needed for selecting target SNPs",sep=''))
  
  read.table(paste(input.plink.file[1],'frq',sep='.'),header=T,as.is=T)->fr.mom
  read.table(paste(input.plink.file[2],'frq',sep='.'),header=T,as.is=T)->fr.dad
  fr<- fr.mom
  fr$MAF <- (fr.mom$MAF+fr.dad$MAF)/2
  
  read.table(paste(input.plink.file[1],'bim',sep='.'),as.is=T)->snp
  snp$ord <- 1:nrow(snp)
  snp.all2 <- merge(snp,fr,by.x='V2',by.y='SNP')
  snp.all2<- snp.all2[order(snp.all2$ord),]
  .mar <- 0.005
  snp.in.fr <- which(abs(snp.all2$MAF-fr.desire)<.mar)
  
  while(length(snp.in.fr)< n.snp+2) {.mar <- .mar+0.001; snp.in.fr <- which(abs(snp.all2$MAF-fr.desire)<.mar)}
  target.snp <- snp.in.fr[seq(1,length(snp.in.fr),length(snp.in.fr)/(n.snp+2))][2:(n.snp+1)]
  print(target.snp)
  snp.all2 <- snp.all2[,c('ord','V2','CHR','V4','A1','A2','MAF','NCHROBS')]
  colnames(snp.all2) <- c('ord','RS','CHR','POS','A1','A2','MAF','NCHROBS')
 # write(target.snp, ncolumns=length(target.snp),file='target_snp.txt')
  return(list(snp.all2, target.snp))
}

#' Getting genotypes of the target SNPs 
#' 
#' 
#' This function read out the genotypes of the selected target SNPs from the orignal data set (the data set on
#' which simulation is based). 
#'
#' @param input.plink.file is a vector of three character strings for the file names of the mother's
#' father's and child's plink base filenames with the necessary path to the directory. The plink files are in bed format and three files with 
#' extensions .bed .bim and .fam are expected for each individual's genotypes. The mothers, fathers, and 
#' childredn must be from the same set of trio families even though the ordering of the families can be different
#' for the three sets of data.
#' @param target.snp is a vector of integers showing the row number of the target SNPs in the .bim file
#' @param snp.all2 is a dataframe containing list of SNPs in PLINK .bim format. Only the second column is used which contains
#' the rs number of the SNPs. The colname name of the second column has to be "V2".
#'
#' @return A list of three matrices is returned. The three matrices are the observed genotypes of the mothers from family 1 to family n repeated twice,
#' genotypes of the fathers from family 1 to family n repeated twice and genotypes of children from family 1 to n followed by (stacking on top of) 
#' genotypes of the complements at the target SNPs.
#' @export
#'
#' @examples tar.snp <- c(21, 118, 121, 140, 155, 168, 218, 383) 
#' m.file <- file.path(system.file(package = "TriadSim"),'extdata/pop1_4chr_mom')
#' f.file <- file.path(system.file(package = "TriadSim"),'extdata/pop1_4chr_dad')
#' k.file <- file.path(system.file(package = "TriadSim"),'extdata/pop1_4chr_kid')
#' # the preloaded data frame snp.all2 contains the data frame read from the corresponding .bim file.
#' \dontrun{ 
#' target.geno <- get.target.geno(c(m.file,f.file,k.file), tar.snp,snp.all2)
#' }
get.target.geno <- function(input.plink.file, target.snp,snp.all2) {
  geno.mom <- snpStats::read.plink(paste(input.plink.file[1],'bed',sep='.'),select.snps= snp.all2[target.snp,'RS'], na.strings = ("-9"))
  geno.dad <- snpStats::read.plink(paste(input.plink.file[2],'bed',sep='.'),select.snps= snp.all2[target.snp,'RS'], na.strings = ("-9"))
  geno.kid <- snpStats::read.plink(paste(input.plink.file[3],'bed',sep='.'),select.snps= snp.all2[target.snp,'RS'], na.strings = ("-9"))
  
  mom <- geno.mom$genotype@.Data
  mom <- mom[order(rownames(mom)),]
  dad <- geno.dad$genotype@.Data
  dad <- dad[order(rownames(dad)),]
  kid <- geno.kid$genotype@.Data
  kid <- kid[order(rownames(kid)),]
  
  mom <- matrix(as.numeric(mom),byrow=F,ncol=ncol(mom))
  dad <- matrix(as.numeric(dad),byrow=F,ncol=ncol(dad))
  kid <- matrix(as.numeric(kid),byrow=F,ncol=ncol(kid))
  mom[mom==0] <- NA; mom <- mom-1
  dad[dad==0] <- NA; dad <- dad-1
  kid[kid==0] <- NA; kid <- kid-1
  
  ### check allele coding if different from mom's coding flip back
  kid[,geno.kid$map$allele.1!=geno.mom$map$allele.1] <- 2- kid[,geno.kid$map$allele.1!=geno.mom$map$allele.1]
  dad[,geno.dad$map$allele.1!=geno.mom$map$allele.1] <- 2- dad[,geno.dad$map$allele.1!=geno.mom$map$allele.1]
  
  
  na.sel <- sum(is.na(mom)|is.na(dad)|is.na(kid))
  mom[na.sel] <-0
  dad[na.sel] <-0
  kid[na.sel] <-0
  comp <- mom+dad-kid
  
  mom[!(comp%in%c(0,1,2))] <- 0
  dad[!(comp%in%c(0,1,2))] <- 0
  kid[!(comp%in%c(0,1,2))] <- 0
  comp[!(comp%in%c(0,1,2))] <- 0
  mom <- 2-mom
  dad <- 2-dad
  kid <- 2-kid
  comp <- 2-comp
  ### stack case trio on top of comp trio
  mom <- rbind(mom,mom)
  dad <- rbind(dad,dad)
  kid <- rbind(kid, comp)
  n.gwas <- nrow(kid)
  
  mom.tar <- mom
  dad.tar <- dad
  kid.tar <- kid
  return(list(mom.tar,dad.tar, kid.tar))
}

#' Picking chromosomal breaking points
#' 
#' 
#' The breaking points at each chromosome can be picked manually or use this function. 
#' When a data frame containing the recombination rates (rcmb.rate) is provided the function
#' tends to pick the breaking points at recombination hotspots.
#'
#' @param N.brk is an integer giving the number of breaks to be picked for each chromosome.
#' @param n.ped is an integer giving the number of trios to be simulated
#' @param snp.all2 is a dataframe containing the list of SNPs in PLINK .bim format. Two columns of the dataframe
#' is used: column 1 with column name "V1" containing the chromosome number and column 4 with column name "V4"
#' containing the chromosomal postion of the SNPs.
#' @param target.snp is a vector of integers showing the row number of the target SNPs in the .bim file.
#' @param rcmb.rate the default value is NA. rcmb.rate is a dataframe containing the recombination rates at each SNP. The ordering of the SNPs should 
#' be identical to that of snp.all2. It contains 4 columns with column names 'CHR','RS','POS',and 'RATE with the 
#' corresponding values for "the chromosomal number", "SNP rs number", "chromosomal position",
#' and "recombination rate". The recombination rate represents the maximum recombination rate in the chromosomal
#' region between the current SNP and the SNP above (or the first basepair of the chromosome for the first SNP on a chromosome).
#' When no rcmb.rate is provided the function will pick the breaking points randomly where keeping the breaking
#' points in between target SNPs. An example recombination rate data frame "rcmb.rate" is already loaded with the package.
#' @param same.brk is an indicator variable to denote whethere the same set of breaking points will be used for all simulated triads
#'
#' @return A list of two elements is returned. The first one is a matrix of integers showing where 
#' the chromosomal breaks is to take place for each individuals in the simulated trios. The second one
#' is a matrix showing the chromosomal segments out of which each target SNP is selected for each simulated trio.
#' @export
#' @importFrom stats quantile
#'
#' @examples 
#' tar.snp <- c(21, 118, 121, 140, 155, 168, 218, 383) 
#' found.brks <- get.brks(N.brk=3,n.ped=1000, snp.all2, tar.snp,rcmb.rate=NA)
#' breaks <- found.brks[[1]]
#' family.pos <- found.brks[[2]] 
#' 
#' 
get.brks <- function(N.brk,n.ped, snp.all2, target.snp,rcmb.rate=NA,same.brk=FALSE){
  ### target SNPs must be sorted from smallest to largest
  if (sum(order(target.snp)!=1:length(target.snp))>0)  stop("Target SNPs are not in the correct order of smallest to largest")
  ## two breaking points on each chr so 3 segments
  n.ped <- n.ped * 1.2
  n.chr <- length(table(snp.all2$CHR)) 
  snp.per.chr <- rep(NA,n.chr)
  for (i in 1:n.chr) {
    if (i==1) snp.per.chr [i] <- length(which(target.snp<cumsum(table(snp.all2$CHR))[1])) else 
      snp.per.chr [i] <- length(which(target.snp<cumsum(table(snp.all2$CHR))[i] & target.snp>cumsum(table(snp.all2$CHR))[i-1]))
  }
  #sort(c(0,cumsum(table(snp.all2$V1)), target.snp))[(c(0,cumsum(snp.per.chr))[i]):(c(0,cumsum(snp.per.chr))[i+1]+1)+1]
  brks <- NULL
  brk.ends <- NULL
  for (i in 1:n.chr) {
    ## if >=2 target SNPs on the chromosome if more target SNPs on chr the number of breaks will be >N.brk for that chr
    ## put breaks in between target.SNPs else put break ends evenly
    if (snp.per.chr[i]>1) {
      if (i==1) {brk0 <- c(1,target.snp[1]-1 )} else {first.pos <- cumsum(table(snp.all2$CHR))[i-1]+1
      last.pos <- target.snp[which(target.snp> cumsum(table(snp.all2$CHR))[i-1])[1]]-1
      brk0 <- c(first.pos,last.pos )} ## fragment before the first SNP
      brk1 <- c(target.snp[cumsum(snp.per.chr)[i]]+1,cumsum(table(snp.all2$CHR))[i] ) ## fragment after the last SNP
      if ((snp.per.chr[i]) < N.brk) brk.ends <- rbind(brk.ends,brk0)
      for (k in 2:snp.per.chr[i]) {
        brk.ends<- rbind(brk.ends,c(target.snp[c(0,cumsum(snp.per.chr))[i]+k-1],target.snp[c(0,cumsum(snp.per.chr))[i]+k]))
      } 
      if ((snp.per.chr[i]-1) < N.brk) brk.ends <- rbind(brk.ends,brk1)
    } else {brk <-  c(1,ceiling(quantile(1:table(snp.all2$CHR)[i],(1:(N.brk))/(N.brk))))
    if (i>1) addon <- cumsum(table(snp.all2$CHR))[i-1] else addon <-0
    for (k in 2:length(brk))  brk.ends <- rbind(brk.ends, c(brk[k-1],brk[k]-1) + addon)
    }
  }
  
  if (length(rcmb.rate)>1 && sum(snp.all2$POS==rcmb.rate$POS)==nrow(snp.all2)) {
    brks <- NULL
    for (i in 1:nrow(brk.ends)) {
      rate1 <- rcmb.rate[brk.ends[i,1]:(brk.ends[i,2]-1),'RATE']
      brks <- cbind(brks, sample(brk.ends[i,1]:(brk.ends[i,2]-1), size=n.ped,prob= rate1,replace=T))}
      brks <- cbind(rep(0,n.ped),brks,matrix(rep(cumsum(table(snp.all2$CHR)),n.ped),byrow=T,nrow=n.ped))   
  } else  {
    brks <- apply(brk.ends,1,function(v){return(sample((v[1]-1):v[2],n.ped,replace=T))})
    brks <- cbind(rep(0,n.ped),brks,matrix(rep(cumsum(table(snp.all2$CHR)),n.ped),byrow=T,nrow=n.ped)) }  
  
  ## request more brks than needed so that the ones with identical breaking points for different segment can be removed
  brks <- t(apply(brks,1,sort))
  n.ped <- n.ped/1.2
  brks <- brks[apply(brks,1,function(v){sum(duplicated(v))})==0,][1:n.ped,]
  ## find position in terms of the chromosome segments (which segment) in sel.fam the fam to be used for each target pos
  fam.pos <- NULL
  for (k in 1:length(target.snp)) {
    fam.pos <- cbind(fam.pos,apply(brks < target.snp[k],1,sum))
  }
  if (same.brk) {
    brks <- matrix(rep(brks[1,],nrow(brks)),byrow=T,nrow=nrow(brks))
    fam.pos <- matrix(rep(fam.pos[1,],nrow(fam.pos)),byrow=T,nrow=nrow(fam.pos))
    print(brks[1,])
  #  write(brks[1,],ncolumns=ncol(brks),file='same_breaks.txt')
  }
  colnames(brks) <- NULL
  return(list(brks,fam.pos))
}

#' Resample families based on the risk model
#' 
#' This function selects families based on the prespecified risk model. It can simulate a homogenous scenario
#' or a stratified scenario with two subpopulations. When e.fr is given rather than the default NA
#' the risk model can involve exposure main effects as well as gene by exposure interation. This function is parallelized
#' and the default number of cores for parallelization is set as the ceiling of half of the total number of CPU cores.
#'
#' @param n.ped is an integer giving the number of trios to be simulated
#' @param brks a matrix of integers showing where the chromosomal breaks is to take place for 
#' each individual in the simulated trios.
#' @param target.snp is a vector of integers showing the row number of the target SNPs in the .bim file.
#' @param fam.pos is a matrix showing the chromosomal segments out of which is each target SNP 
#' selected for each simulated trio.
#' @param mom.tar is a matrix containing genotypes of the target SNPs in the mothers of the original data 
#' for simulations of a homogenous population. For simulations under population stratification it is 
#' a list of two matrices each containing genotypes of the mothers' target SNP genotypes in one of the two subpopulations.
#' @param dad.tar is a matrix containing the genotypes of the target SNPs in the fathers of the original data
#' for simulations of a homogenous population. For simulations under population stratification it is 
#' a list of two matrices each containing fathers' target SNP genotypes in one of the two subpopulations.
#' @param kid.tar is a matrix with containing genotypes of the target SNP in the children stacking on top of the complements of the original data
#' for simulations of a homogenous population. For simulations under population stratification it is 
#' a list of two matrices each containing children's and complements' target SNP genotypes in one of the two subpopulations.
#' @param pathways is a list of vectors of integers. Each vector of integers denotes the SNPs involved in a particular pathway. E.g. list(1:4,5:8) denote that there
#' are two pathways. SNPs 1-4 are in the first pathway and SNPs 5-8 are in the second.
#' @param betas.e0 is a vector of doubles giving the beta coefficients of the logit risk model for the unexposed individuals. The length of the vector
#' should be 1+ number_of_risk_pathway. The first number is a function of the disease prevalence in the
#'  unexposed individual who does not carry any copies of the risk pathway. The numbers after that gives the odds ratios 
#' for carrying one/two copies of the risk pathways comparing to those who do not carry any copies of the pathways in the unxposed group. e.g., c(-6.4, 0.5,1)
#' means the baseline disease prevalence is exp(-6.4)/(1+exp(-6.4)) and the log OR for carrying at least one copy of the first pathway is 0.5 and
#' that for carrying at least one copy of the second pathway is 1.
#' @param e.fr is a double number between 0 and 1 which gives the exposure prevalence.
#' @param betas.e is a vector of doubles giving the beta coefficients of the logit risk model for the exposed individuals. The length of the vector
#' should be 1+ number_of_risk_pathway. The first number is a function the disease prevalence in the
#' exposed individual who does not carry any copies of the risk pathway. The numbers after that gives the odds ratios 
#' for carrying one/two copies of the risk pathways comparing to those who do not carry any copies of the pathways in the exposed group.
#' @param pop1.frac is a double number between 0 and 1 which gives the fraction of subpopulation 1 out of the two subpopulations for a population stratification scenario.
#' @param rate.beta is a double number giving the log OR of disease prevalence in population 2 over that in population 1.
#' @param is.case is a boolean variable. When is.case = TRUE case-parents trios will be simulated.Otherwise, control-parents trios will be simulated.
#' @param qtl is a boolean variable denoting whether a quantitative trait (qtl=TRUE) or a binary trait (qtl=FALSE) is to be simulated. For a binary trait only affected 
#' families will be kept. The default value is qtl=FALSE.
#' @param out.put.file is a character string giving the base file name for the output file. When a non-default value is given the fucntion will write the following files to 
#' the designated directory: 
#' a file with name ending with "exp.txt" containing the exposure data when exposure is involved in the risk model.
#' a file with name ending with "pop.txt" containing information on subpopulation membership when the simulation involves a stratified scenario.
#' a file with name ending with "pheno.tx" containing quantitative trait phenotype when a quantitative trait is involved.
#' When out.put.file is the default value NA the file names for the above three files are: exposure.txt, population.txt, phenotype.txt.
#' @param no_cores is an integer which specifies the number of CPU cores to be parallelized.
#' @return The function returns a list of five elements. The first one is a matrix of integers giving the families (in terms of row number) 
#' selected for each simulated trio and each chromosomal segment. The second one is a matrix giving the genotypes on the 
#' target SNPs in the simulated trio. The third one is relevant only when exposure is involved. It is a vector of 0's and 1's giving the exposure status of each simulated trio when the risk model involves
#' exposure. The fourth element is relevant only in simulations of stratified scenarios. It is a vector of 1's and 2's giving the memebership of the subpopulation groups of each simulated trio. 
#' The fifth element is relevant only in simulations of a quantitative trait. It is a vector of doubles giving the phenotype 
#' values for simulations of a quantitative trait.
#' @export
#' @importFrom foreach %dopar%
#' @importFrom stats rbinom
#' @importFrom stats rnorm
#'
#' @examples 
#' tar.snp <- c(21, 118, 121, 140, 155, 168, 218, 383) 
#' found.brks <- get.brks(N.brk=3,n.ped=1000, snp.all2, tar.snp,rcmb.rate=NA)
#' breaks <- found.brks[[1]]
#' family.position <- found.brks[[2]] 
#' betas <- c(-6.4, 3.2, 5.8)
#' pwy <- list(1:4,5:8)
#' m.file <- file.path(system.file(package = "TriadSim"),'extdata/pop1_4chr_mom')
#' f.file <- file.path(system.file(package = "TriadSim"),'extdata/pop1_4chr_dad')
#' k.file <- file.path(system.file(package = "TriadSim"),'extdata/pop1_4chr_kid')
#' # the preloaded data frame snp.all2 contains the data frame read from the corresponding .bim file.
#' target.geno <- get.target.geno(c(m.file,f.file,k.file), tar.snp,snp.all2)
#' mom.target <- target.geno[[1]]
#' dad.target <- target.geno[[2]]
#' kid.target <- target.geno[[3]]

#' \dontrun{ 
#' fitted.model <- fit.risk.model.par(n.ped=1000,brks=breaks,target.snp=tar.snp, 
#' fam.pos=family.position,mom.tar=mom.target,dad.tar=dad.target, kid.tar=kid.target,  
#' pathways=pwy,betas, e.fr=NA, betas,pop1.frac= NA,rate.beta=NA,no_cores=2)
#' }

fit.risk.model.par <- function(n.ped,brks,target.snp,fam.pos, mom.tar,dad.tar, kid.tar, pathways, betas.e0, e.fr=NA, betas.e,pop1.frac=NA,rate.beta= NA,is.case=TRUE, qtl=FALSE,out.put.file=NA,no_cores=NA) {
  two.pop <- length(mom.tar)==2
  brks.org <- brks
  fam.pos.org <- fam.pos
  mom.tar.org <- mom.tar
  dad.tar.org <- dad.tar
  kid.tar.org <- kid.tar
  betas.e0.org <- betas.e0
  betas.e.org <- betas.e
  e.fr.org <- e.fr
  if (length(betas.e0) != 1+length(pathways) ) stop("Number of beta coefficients does not match number of pathways")
  
  sim.fam <- function(i) {
    aff <- 0
    while (aff==0) {
      ### decide which population first ###
      pop <- NA
      if (two.pop)   {pop <- sample(c(1,2),1,prob=c(pop1.frac,1-pop1.frac))
      #brks <- brks.org[[pop]]
      #fam.pos <- fam.pos.org[[pop]]
      mom.tar <- mom.tar.org[[pop]]
      dad.tar <- dad.tar.org[[pop]]
      kid.tar <- kid.tar.org[[pop]]
      betas.e0 <- betas.e0.org; betas.e0[1] <- betas.e0[1] + (pop==2)*rate.beta
      if (!is.na(e.fr[1])) {betas.e <- betas.e.org; betas.e[1] <- betas.e[1] + (pop==2)*rate.beta}
      e.fr <- e.fr.org[pop]
      }
      n.gwas <- nrow(mom.tar)
      n.seg <- ncol(brks)-1
      n.ped <- nrow(fam.pos)
      e <- 0 ## use betas.e0 when e=0 or  when no exposure in the risk model
      if (!is.na(e.fr[1])) {
        if (length(betas.e) != 1+length(pathways) ) stop("Number of beta coefficients does not match number of pathways")
        e <- sample(c(0,1),1,prob=c(1-e.fr,e.fr))
      }
      if (e==0) betas <- betas.e0 else  betas <- betas.e  
      
      brk <- brks[i,]
      sel.fam <- sample(1:n.gwas,n.seg,replace=T)
      ### for each target SNP which fragment was used -> get the corresponding sel.fam for target.snp 1 to n.snp
      pathway.m <- mom.tar[cbind(sel.fam[fam.pos[i,]],1:length(target.snp))]
      pathway.f <- dad.tar[cbind(sel.fam[fam.pos[i,]],1:length(target.snp))]
      pathway.c <- kid.tar[cbind(sel.fam[fam.pos[i,]],1:length(target.snp))]
      
      X<- NULL
      for (p in 1:length(pathways)) {pathway1 <- pathways[[p]];
      X<- c(X,1*(sum(pathway.c[pathway1]>0)==length(pathway1)))}
      linear.pred <- betas[1] + sum(X*betas[-1])
      if (qtl) linear.pred <- linear.pred + rnorm(1)
      p_case =  exp(linear.pred)/(1+exp(linear.pred))
      aff <-rbinom(1,1, p_case)
      if (!is.case)  aff <- 1-aff
      if (qtl) aff <- 1
    }
    c(i,e,pop, sel.fam, pathway.m,pathway.f,pathway.c,linear.pred)
  }
  if (is.na(no_cores)) no_cores <- ceiling(parallel::detectCores()/2)
  cl<-parallel::makeCluster(no_cores)
  doParallel::registerDoParallel(cl)
  i <- NULL
  sim.all <- foreach::foreach(i = 1:n.ped, 
                              .combine = rbind, 
                              .export = c('n.ped','mom.tar','dad.tar', 'kid.tar', 'betas.e0', 'e.fr', 'betas.e')) %dopar%  
    sim.fam(i)
  # Finish
  on.exit(parallel::stopCluster(cl))
  foreach::registerDoSEQ()
  pathway.all <- sim.all[,-(1:(ncol(brks)+2))]
  pheno <- pathway.all[,ncol(pathway.all)]
  pathway.all <- pathway.all[,-ncol(pathway.all)]
  n.snp <- ncol(pathway.all)/3
  pathway.all <- rbind(pathway.all[,1:n.snp],pathway.all[,1:n.snp+n.snp],pathway.all[,1:n.snp+2*n.snp])
  sel.fam.all  <- sim.all[,1:(ncol(brks)-1)+3]
  e.vec <- sim.all[,2]
  pop.vec <- sim.all[,3]
 
  if (!is.na(e.fr[1])) if (is.na(out.put.file)) write(e.vec,ncolumns=1,file=file.path(tempdir(),'/exposure.txt')) else write(e.vec,ncolumns=1,file=paste(out.put.file,'_exposure.txt'))
  if (two.pop)  if (is.na(out.put.file)) write(pop.vec,ncolumns=1,file=file.path(tempdir(),'/population.txt')) else write(pop.vec,ncolumns=1,file=paste(out.put.file,'_population.txt'))
  if (qtl)  if (is.na(out.put.file)) write(pheno,ncolumns=1,file=file.path(tempdir(),'/phenotype.txt')) else write(pheno,ncolumns=1,file=paste(out.put.file,'_phenotype.txt'))
  
  return(list(sel.fam.all, pathway.all,e.vec,pop.vec,pheno))
}

#' Splicing chromosomal segments
#' 
#' 
#' This function splices the triad chromosomal segments into "complete" trios. The spliced trio sets are written into separate plink
#' files chromosome by chromosome. It is parallelized and if no no_cores value is given the ceiling of half of the total number
#' of CPUs available will be used in the parallelization.
#'
#' @param input.plink.file for simulations of homogenous population, it is a vector of three character strings for the base filenames of the mother's
#' father's and child's plink base filenames. The plink files are in bed format and in the same folder three files with 
#' extensions .bed .bim and .fam are expected for each individual's genotypes. The mothers, fathers, and 
#' childredn must be from the same set of trio families even though the ordering of the families can be different
#' for the three sets of data. For simulations under population stratification it is a list of two vectors. Each vector is a vector 
#' of three character strings for the base filenames as described above.The two vectors correspond to the two subpopulations.
#' @param out.put.file is a character string giving the base file name for the output file. Genotypes on different chromosomes are output to different files.
#' The final file name also contains information on chromosome number. E.g., for a base filename "trio" and for chromosome 1 the final file name is "trio1sim".
#' @param brks is a matrix of integers showing where the chromosomal breaks is to take place for each individual in the simulated trios.
#' @param sel.fam.all is a matrix of integer giving the families (in terms of row number) selected for each chromosomal segment and each simulated trio.
#' @param snp.all2 is a dataframe containing the list of SNPs in PLINK .bim format. Two columns of the dataframe
#' is used: column 1 with column name "V1" containing the chromosome number and column 2 with column name "V2" containing the rs number of the SNPs.
#' @param pathway.all is a matrix giving the genotypes on the pathway SNPs in the simulated trio.
#' @param target.snp is a vector of integers showing the row number of the target SNPs in the .bim file.
#' @param pop.vec is a vector of 1's and 2's giving the subpopulation group of each simulated trio. This parameter is relevant only for stratified scenarios.
#' @param no_cores is an integer which specifies the number of CPU cores to be parallelized.
#' @param flip is a boolean indicating whether the mother's and the father's genotypes will be swapped to wipe out potential maternal effects in the orignal data.
#'
#' @return This function does not return values. Instead it writes PLINK files into the designated directory. Each set of PLINK files contains genotype
#' data for one chromosome for all trios. The first one third of the rows are genotypes of the mothers'. The second one third are those of the fathers' and the last one 
#' third are the children's. 
#' @export
#' @importFrom foreach %dopar%
#' @importFrom methods new
#'
#' @examples 
#' tar.snp <- c(21, 118, 121, 140, 155, 168, 218, 383) 
#' found.brks <- get.brks(N.brk=3,n.ped=1000, snp.all2, tar.snp,rcmb.rate=NA)
#' breaks <- found.brks[[1]]
#' family.position <- found.brks[[2]] 
#' betas <- c(-6.4, 3.2, 5.8)
#' pwy <- list(1:4,5:8)
#' m.file <- file.path(system.file(package = "TriadSim"),'extdata/pop1_4chr_mom')
#' f.file <- file.path(system.file(package = "TriadSim"),'extdata/pop1_4chr_dad')
#' k.file <- file.path(system.file(package = "TriadSim"),'extdata/pop1_4chr_kid')
#' # the preloaded data frame snp.all2 contains the data frame read from the corresponding .bim file.
#' target.geno <- get.target.geno(c(m.file,f.file,k.file), tar.snp,snp.all2)
#' mom.target <- target.geno[[1]]
#' dad.target <- target.geno[[2]]
#' kid.target <- target.geno[[3]]

#'  \dontrun{
#' fitted.model <- fit.risk.model.par(n.ped=1000,brks=breaks,target.snp=tar.snp, 
#' fam.pos=family.position,mom.tar=mom.target,dad.tar=dad.target, kid.tar=kid.target,  
#' pathways=pwy,betas, e.fr=NA, betas,pop1.frac= NA,rate.beta=NA,no_cores=2)
#' sel.fam <- fitted.model[[1]]
#' sim.pathway.geno <-  fitted.model[[2]]
#' glue.chr.segment.par(c(m.file,f.file,k.file),file.path(tempdir(),'trio'), breaks,sel.fam,
#'                      snp.all2,sim.pathway.geno,target.snp,pop.vec=NA,no_cores=1,flip=TRUE) 
#' }
#' 

### parallel by pedigree ###
glue.chr.segment.par <- function(input.plink.file,out.put.file, brks,sel.fam.all,snp.all2,pathway.all,target.snp,pop.vec=NA,no_cores=NA,flip=TRUE) {
  if (sum(order(target.snp)!=1:length(target.snp))>0)  stop("Target SNPs are no in the correct order of smallest to largest")
  n.ped <- nrow(brks)
  sel.fam.all.org <- sel.fam.all
  chr.brk <- c(0,which(brks[1,]%in%cumsum(table(snp.all2$CHR))))
  risk.chr <-  sapply(target.snp,function(v){sum(v>cumsum(table(snp.all2$CHR)))+1})
  chr.pos <- target.snp - c(0,cumsum(table(snp.all2$CHR)))[risk.chr]
  
  input.plink.file.org <- input.plink.file
  if (length(input.plink.file)==2) pop <- 2 else pop <- 1
  
  ### randomly switch mother/father to remove maternal effects in the original data #########
  mf.flip <- matrix(rbinom(n.ped*(ncol(brks)-1),1,0.5),ncol=ncol(brks)-1,nrow=n.ped)
  if (!flip)   mf.flip <- matrix(0,ncol=ncol(brks)-1,nrow=n.ped)
  
  splice.chr <- function(i) {
    
    mom1 <- NULL
    dad1 <- NULL
    kid1 <- NULL
    
    brks1 <- brks[,(chr.brk[one.chr]+1):chr.brk[one.chr+1]]
    sel.fam.all1 <- sel.fam.all[,chr.brk[one.chr]:(chr.brk[one.chr+1]-1)]
    if (one.chr>1) {brks1 <- brks1- brks[1,chr.brk[one.chr]];brks1<-cbind(rep(0,n.ped),brks1)}
    for (k in 1:(ncol(brks1)-1)){
      if (mf.flip[i,k]==1) {
        mom1 <- as.numeric(c(mom1,dad[sel.fam.all1[i,k],(brks1[i,k]+1):brks1[i,k+1]]))
        dad1 <- as.numeric(c(dad1,mom[sel.fam.all1[i,k],(brks1[i,k]+1):brks1[i,k+1]]))
      } else {
        mom1 <- as.numeric(c(mom1,mom[sel.fam.all1[i,k],(brks1[i,k]+1):brks1[i,k+1]]))
        dad1 <- as.numeric(c(dad1,dad[sel.fam.all1[i,k],(brks1[i,k]+1):brks1[i,k+1]]))
      }
      kid1 <- as.numeric(c(kid1,kid[sel.fam.all1[i,k],(brks1[i,k]+1):brks1[i,k+1]]))
    }
    return(c(mom1,dad1,kid1))
  }
  
  for (one.chr in 1:length(unique(snp.all2$CHR)) ) {
    mom.all <- NULL
    dad.all <- NULL
    kid.all <- NULL
    n.ped.pop1 <-0
    for (p in 1:pop) {
      chr <- unique(snp.all2$CHR)[one.chr]
      if (pop==1) {
        geno.mom <- snpStats::read.plink(paste(input.plink.file[1],'bed',sep='.'),select.snps= snp.all2[snp.all2$CHR==chr,'RS'], na.strings = ("-9"))
        geno.dad <- snpStats::read.plink(paste(input.plink.file[2],'bed',sep='.'),select.snps= snp.all2[snp.all2$CHR==chr,'RS'], na.strings = ("-9"))
        geno.kid <- snpStats::read.plink(paste(input.plink.file[3],'bed',sep='.'),select.snps= snp.all2[snp.all2$CHR==chr,'RS'], na.strings = ("-9"))
      } else {
        geno.mom <- snpStats::read.plink(paste(input.plink.file[[p]][1],'bed',sep='.'),select.snps= snp.all2[snp.all2$CHR==chr,'RS'], na.strings = ("-9"))
        geno.dad <- snpStats::read.plink(paste(input.plink.file[[p]][2],'bed',sep='.'),select.snps= snp.all2[snp.all2$CHR==chr,'RS'], na.strings = ("-9"))
        geno.kid <- snpStats::read.plink(paste(input.plink.file[[p]][3],'bed',sep='.'),select.snps= snp.all2[snp.all2$CHR==chr,'RS'], na.strings = ("-9"))
      }
      mom <- geno.mom$genotype@.Data
      mom <- mom[order(rownames(mom)),]
      dad <- geno.dad$genotype@.Data
      dad <- dad[order(rownames(dad)),]
      kid <- geno.kid$genotype@.Data
      kid <- kid[order(rownames(kid)),]
      geno.mom$genotype <-NULL
      
   
      mom <- matrix(as.numeric(mom),byrow=F,ncol=ncol(mom))
      dad <- matrix(as.numeric(dad),byrow=F,ncol=ncol(mom))
      kid <- matrix(as.numeric(kid),byrow=F,ncol=ncol(mom))
      mom[mom==0] <- NA; mom <- mom-1
      dad[dad==0] <- NA; dad <- dad-1
      kid[kid==0] <- NA; kid <- kid-1
      
      ### check allele coding if different from mom's coding flip back
      kid[,geno.kid$map$allele.1!=geno.mom$map$allele.1] <- 2- kid[,geno.kid$map$allele.1!=geno.mom$map$allele.1]
      dad[,geno.dad$map$allele.1!=geno.mom$map$allele.1] <- 2- dad[,geno.dad$map$allele.1!=geno.mom$map$allele.1]
      rm(geno.dad,geno.kid)
      gc()     
 ### anyone in a triad is missing set whole triad missing
 #     na.sel <- sum(is.na(mom)|is.na(dad)|is.na(kid))
 #     mom[na.sel] <-NA
 #     dad[na.sel] <-NA
 #     kid[na.sel] <-NA
### non-mendelian families
      triad45 <- c("222","212","211","122","121","201","021","112","111","110","101","100","011","010","000","NA22",
                   "NA12","NA11","NA22","NA21","NA01","NA21","NA12","NA11","NA10","NA01","NA00","NA11","NA10","NA00",
                   "2NA2","2NA2","2NA1","1NA2","1NA1","2NA1","0NA1","1NA2","1NA1","1NA0","1NA1","1NA0","0NA1","0NA0","0NA0",
                   "00NA","01NA","02NA","10NA","11NA","12NA","20NA","21NA","22NA","NANA2","NANA2","NANA1","NANA2","NANA1","NANA1","NANA1","NANA2","NANA1","NANA0","NANA1","NANA0",
                   "NANA1","NANA0","NANA0","2NANA","2NANA","2NANA","1NANA","1NANA","2NANA","0NANA","1NANA","1NANA",
                   "1NANA","1NANA","1NANA","0NANA","0NANA","0NANA","NA2NA","NA1NA","NA1NA","NA2NA","NA2NA","NA0NA",
                   "NA2NA","NA1NA","NA1NA","NA1NA","NA0NA","NA0NA","NA1NA","NA1NA","NA0NA","NANANA")
      
      mfc <- paste(mom,dad,kid,sep='' )
      mfc <- matrix(!(mfc%in%triad45),ncol=ncol(mom),byrow = F)
 
      ##if there is nonmendelian zero out the triads for that SNP
      mom[mfc] <-NA
      dad[mfc] <-NA
      kid[mfc] <-NA 
      comp <- mom+dad-kid
      
      mom <- 2-mom
      dad <- 2-dad
      kid <- 2-kid
      comp <- 2-comp
      ### stack case trio on top of comp trio
      mom.all <- rbind(mom.all,mom,mom)
      rm(mom);gc()
      dad.all <- rbind(dad.all,dad,dad)
      rm(dad);gc()
      kid.all <- rbind(kid.all, kid, comp)
      rm(kid,comp);gc()
      
      if (p==1 & pop==2) n.ped.pop1 <- nrow(mom.all)
    }
    mom <- mom.all; rm(mom.all);gc()
    dad <- dad.all; rm(dad.all);gc()
    kid <- kid.all; rm(kid.all);gc()
    #  brks <- brks.org
    if (pop==2) sel.fam.all[pop.vec==2,] <- sel.fam.all.org[pop.vec==2,] + n.ped.pop1
    
    moms<- NULL
    dads <- NULL
    kids <- NULL
    
    if (is.na(no_cores)) no_cores <- ceiling(parallel::detectCores()/2)
    cl<-parallel::makeCluster(no_cores)
    doParallel::registerDoParallel(cl)
    i <- NULL
    geno.mfc <-  foreach::foreach(i = 1:n.ped, 
                                  .combine = rbind) %dopar%  
      splice.chr(i)
    # Finish
    #getDoParWorkers()  
    (parallel::stopCluster(cl))
     foreach::registerDoSEQ()
     
    rm(mom,dad,kid) ; gc()
    n.snp <- ncol(geno.mfc)/3
    geno.all <- geno.mfc[,1:n.snp]; 
    geno.all <- rbind(geno.all,geno.mfc[,1:n.snp+n.snp]);
    geno.all <- rbind(geno.all,geno.mfc[,1:n.snp+n.snp*2]); 
    
    ### due to fill in zero's at the creating target snp genotypes to maximize sample size, spliced data wil be different. Fill in with pathway.all results ##
    geno.all[,chr.pos[risk.chr==one.chr]] <- pathway.all[,risk.chr==one.chr]
    ## genotyping coded as 1, 2, 3 1 being rarest
    geno.all <- 3-geno.all
    
    fam.all <- data.frame(rep(1:n.ped,3),paste(rep(1:n.ped,3),rep(c(3,2,1),each=n.ped),sep="_"),rep(c(0,0,3),each=n.ped),rep(c(0,0,2),each=n.ped),rep(c(2,1,0),each=n.ped),rep(c(1,1,2),each=n.ped))
    colnames(fam.all) <- c('PID','UPN','MID','FID','Gender','Aff')
    rownames(geno.all) <- fam.all$UPN
    colnames(geno.all) <- colnames(geno.mom$genotype@.Data)
    geno.all[is.na(geno.all)] <- 0
    geno.all <- new('SnpMatrix',geno.all) 
    snpStats::write.plink(paste(out.put.file,chr,sep=''), snp.major = TRUE, geno.all,
                          pedigree=fam.all$PID, id=fam.all$UPN, father=fam.all$FID, mother=fam.all$MID, sex=fam.all$Gender, phenotype=fam.all$Aff,
                          chromosome=geno.mom$map$chromosome, genetic.distance=geno.mom$map$cM, position=geno.mom$map$position, allele.1=geno.mom$map$allele.1, allele.2=geno.mom$map$allele.2,
                          na.code = 0)
    rm(geno.all,geno.mom);gc();
  }
  }

#' Simulation main function
#'
#' TriadSim can simulate genotypes for case-parent triads, case-control, and quantitative trait samples with realistic linkage
#' diequilibrium structure and allele frequency distribution. For studies of epistasis one can simulate models that involve 
#' specific SNPs at specific sets of loci, which we will refer to as "pathways". TriadSim generates genotype data by resampling 
#' triad genotypes from existing data. It takes genotypes in PLINK format as the input files.
#' 
#' @param input.plink.file gives the filenames (as well as the path) of the source data used for resampling. The input files are in PLINK format.
#' For simulations of a homogenous population, it is a vector of three character strings for the base filenames of the mother's
#' father's and child's PLINK files. The PLINK files are in bed format and three files with 
#' extensions .bed .bim and .fam are expected for each individual's genotypes. The mothers, fathers, and 
#' children must be from the same set of triad families even though the ordering of the families can be different
#' for the three sets of data. For simulations under population stratification it is a list of two vectors. Each vector is a vector 
#' of three character strings giving the base filenames for the PLINK files as described above.The two vectors correspond to the 
#' two subpopulations.
#' @param out.put.file is a character string giving the pathway to and the base filename of the output file. The names of the final output files 
#' also contain information on chromosome number. E.g., for a base filename "trio" and for chromosome 1 the final filenames for the PLILK files
#' are "trio1.bim","trio1.bed" and "trio1.fam".
#' @param fr.desire is a double number giving the desired frequency of the target SNPs. 
#' @param pathways is a list of vectors of integers. Each vector of integers denotes the SNPs involved in a particular pathway. E.g. list(1:4,5:8)
#' @param n.ped is an integer giving the number of trios to be simulated
#' @param N.brk is an integer giving the number of breaks to be picked for each chromosome.
#' @param target.snp is a vector of integers showing the row number of the target SNPs in the .bim file.
#' @param P0 gives the baseline disease prevalence in the unexposed individuals with 0 copies of the risk pathways.
#' @param is.OR is a boolean varialbe denoting wether the input risk parameters are odds ratios. It is TRUE when the input risks are odds ratios.
#' @param risk.exposure is a double giving the relative risk (or odds ratio, if is.OR=TRUE) of the exposure main effect.
#' @param risk.pathway.unexposed is a vector of doubles giving the relative risk (or odds ratio, if is.OR=TRUE) of each risk pathways in the unexposed individuals 
#' with the risk of unexposed individuals who carry no copies of the pathways as a reference.For scenarios that do not involve exposure the value 
#' of this vector is for all individuals.
#' @param risk.pathway.exposed is a vector of doubles giving the relative risk (or odds ratio, if is.OR=TRUE) of each risk pathways in the exposed individuals. 
#' with the risk of exposed individuals who carry no copies of the pathways as a reference. For scenarios that do not involve exposure the value 
#' of this vector is not used.
#' @param is.case is a boolean variable. When is.case = TRUE case-parents trios will be simulated.Otherwise, control-parents trios will be simulated.
#' @param e.fr is a double number between 0 and 1 which gives the exposure prevalence.
#' @param pop1.frac is a double number between 0 and 1 which gives the fraction of population 1 for a population stratification scenario.
#' @param P0.ratio gives the ratio of the baseline disease prevalence in the second subpopulation to that of the first subpopulation.
#' @param rcmb.rate the default value is NA. rcmb.rate is a dataframe containing the recombination rates at each SNP. The ordering of the SNPs (in rows) should 
#' be identical to that of snp.all2. It has 4 columns with the column names 'CHR','RS','POS', and 'RATE' representing "the chromosomal number", 
#' "SNP rs number", "chromosomal position", and "recombination rate", respectively. The recombination rate represents the maximum recombination 
#' rate in the chromosomal region between the current SNP and the SNP above (or the first basepair of the chromosome for the first SNP on a 
#' chromosome).When no rcmb.rate is provided the function will pick the breaking points randomly.
#' @param no_cores is an integer which specifies the number of CPU cores to parallelized.contain values
#' @param qtl is a boolean variable denoting whether a quantitative trait (qtl=TRUE) or a binary trait (qtl=FALSE) is to be simulated. For a binary trait only affected 
#' families will be kept. The default value is qtl=FALSE.
#' @param same.brk is an indicator variable to denote whether the same set of breaking points will be used for all simulated triads. The default value is FALSE.
#' @param flip is an indicator variable denoting whether the mother's and the father's genotypes will be swapped to wipe out potential maternal 
#' effects in the orignal data. The default value is TRUE.
#'
#' @return this function simulates genotypes of parent-offspring triads and writes PLINK files into the designated directory. Genotypes on each
#' chromosome will be written into a separate set of PLINK files. In each set of PLINK files genotypes of the mothers, fathers, and children
#' are stacked on top of each other. The first third of the rows are genotypes of the mothers'.The seond third are those of the fathers' and 
#' the last third are those of the children's. 
#' The following files are also generated under specific scenarios:
#' a file with name ending with "exp.txt" containing the exposure data when exposure is involved in the risk model.
#' a file with name ending with "pop.txt" containing information on subpopulation membership when the simulation involves a stratified scenario.
#' a file with name ending with "pheno.tx" containing quantitative trait phenotype when a quantitative trait is involved.

#' @export
#'
#' @examples 
#' m.file <- file.path(system.file(package = "TriadSim"),'extdata/pop1_4chr_mom')
#' f.file <- file.path(system.file(package = "TriadSim"),'extdata/pop1_4chr_dad')
#' k.file <- file.path(system.file(package = "TriadSim"),'extdata/pop1_4chr_kid')
#' input.plink.file <- c(m.file, f.file, k.file)

#' \dontrun{TriadSim(input.plink.file, file.path(tempdir(),'triad'), fr.desire=0.05,pathways=list(1:4,5:8),
#'        n.ped=1000, N.brk=3, target.snp=NA,P0=0.001,is.OR=FALSE,risk.exposure= 1,
#'        risk.pathway.unexposed=c(1.5, 2), risk.pathway.exposed=c(1.5, 2), is.case=TRUE, e.fr=NA, 
#'        pop1.frac=NA, P0.ratio=1, rcmb.rate, no_cores=1)}

TriadSim <- function(input.plink.file, out.put.file, fr.desire,pathways,n.ped, N.brk, target.snp=NA,P0, is.OR,
         risk.exposure, risk.pathway.unexposed, risk.pathway.exposed, is.case=TRUE, e.fr=NA, pop1.frac=NA,P0.ratio=1,
         rcmb.rate=NA, no_cores=NA,qtl=FALSE,same.brk=FALSE,flip=TRUE) {
### converting RR to OR, outputlogOR
    RR2OR <- function(RR, P0) {
    if (max(RR*P0) > 1) stop("Risk >1")
    OR <- RR*(1-P0)/(1-RR*P0)
    return(log(OR)) }
  
    beta0 <- log(P0/(1-P0))
  if (!is.OR) { 
    betas.e0 <- c(beta0, RR2OR(risk.pathway.unexposed,P0))
    if (!is.na(e.fr[1])) betas.e1 <- c(beta0 + RR2OR(risk.exposure,P0), RR2OR(risk.pathway.exposed,P0))  else betas.e1 <- NA
  } else {
    betas.e0 <- c(beta0, log(risk.pathway.unexposed))
    if (!is.na(e.fr[1])) betas.e1 <- c(beta0 +log(risk.exposure), log(risk.pathway.exposed)) else betas.e1 <- NA
  }
  
  if (P0*P0.ratio > 1) stop("The baseline disease prevalence in the second subpopulation cannot be above one.")
  rate.beta <- log((P0*P0.ratio)/(1-P0*P0.ratio)) - beta0
  if (length(input.plink.file)==2) two.pop <- T else two.pop <- F
  if (two.pop) file1 <- input.plink.file[[1]][1:2] else file1 <- input.plink.file[1:2]
  ### pick target SNPs if not already given. For stratified scenario only allele frequency in the first subpopulation determine SNP selection
  if (is.na(target.snp[1])) {
    n.snp <- length(unique(unlist(pathways)))
    pick.snp <- pick_target.snp(file1, fr.desire,n.snp) 
    snp.all2 <- pick.snp[[1]]
    target.snp <- pick.snp[[2]]
  } else if (sum(order(target.snp)!=1:length(target.snp))>0)  stop("Target SNPs are not in the correct order of smallest to largest")
  ### get target SNP genotypes 
  if (!two.pop) {
    tar.geno <- get.target.geno(input.plink.file, target.snp,snp.all2)
    mom.tar <- tar.geno[[1]]
    dad.tar <- tar.geno[[2]]
    kid.tar <- tar.geno[[3]]
  } else {
    tar.geno <- get.target.geno(input.plink.file[[1]], target.snp,snp.all2)
    mom1.tar <- tar.geno[[1]]
    dad1.tar <- tar.geno[[2]]
    kid1.tar <- tar.geno[[3]]
    tar.geno <- get.target.geno(input.plink.file[[2]], target.snp,snp.all2)
    mom2.tar <- tar.geno[[1]]
    dad2.tar <- tar.geno[[2]]
    kid2.tar <- tar.geno[[3]]
    
    mom.tar <- list(mom1.tar, mom2.tar)
    dad.tar <- list(dad1.tar, dad2.tar)
    kid.tar <- list(kid1.tar, kid2.tar)
  }
  ### get breaking points
  brks <- get.brks(N.brk,n.ped, snp.all2,target.snp,rcmb.rate,same.brk)
  fam.pos <- brks[[2]]
  brks <- brks[[1]]
  ### fit risk (either case or control trios)
  fit.risk <- fit.risk.model.par(n.ped,brks,target.snp,fam.pos, mom.tar,dad.tar, kid.tar, pathways, betas.e0,e.fr, betas.e1,pop1.frac, rate.beta,is.case, qtl,out.put.file,no_cores) 
  sel.fam.all <- fit.risk[[1]]
  pathway.all <- fit.risk[[2]]
  e.vec <- fit.risk[[3]]
  pop.vec <- fit.risk[[4]]
  ### splice chromosomal fragments together
  glue.chr.segment.par(input.plink.file,out.put.file, brks,sel.fam.all,snp.all2,pathway.all,target.snp,pop.vec,no_cores,flip)
}
Any scripts or data that you put into this service are public.
TriadSim documentation built on Sept. 9, 2021, 1:06 a.m.
rdrr.io home R language documentation Run R code online
CRAN packages Bioconductor packages R-Forge packages GitHub packages
Note that we can't provide technical support on individual packages. You should contact the package authors for that.
TriadSim
Simulating Triad Genomewide Genotypes

R/TriadSim_functions.R
In TriadSim: Simulating Triad Genomewide Genotypes

Defines functions TriadSim glue.chr.segment.par fit.risk.model.par get.brks get.target.geno pick_target.snp

Documented in fit.risk.model.par get.brks get.target.geno glue.chr.segment.par pick_target.snp TriadSim

Try the TriadSim package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

TriadSim Simulating Triad Genomewide Genotypes

R/TriadSim_functions.R In TriadSim: Simulating Triad Genomewide Genotypes

Defines functions TriadSim glue.chr.segment.par fit.risk.model.par get.brks get.target.geno pick_target.snp

Documented in fit.risk.model.par get.brks get.target.geno glue.chr.segment.par pick_target.snp TriadSim

Try the TriadSim package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

TriadSim
Simulating Triad Genomewide Genotypes

R/TriadSim_functions.R
In TriadSim: Simulating Triad Genomewide Genotypes