R/zooroh_model.R

Defines functions zoomodel

Documented in zoomodel

setClass(Class = "zmodel",
         representation(typeModel = "character", mix_coef = "vector", krates = "vector",
                        err = "numeric", seqerr = "numeric", XM = "matrix", typeClass ="character")
)

is.zmodel <- function (x)
{
  res <- (is(x,"zmodel") & validObject(x))
  return(res)
}

#'Define the model for the RZooRoH
#'
#'Help the user to create a model for RZooRoH, including default parameters. The
#'output is a zmodel object necessary to run RZooRoH.
#'
#'@param predefined Logical (TRUE or FALSE) to define whether rates of HBD and
#'  non-HBD-classes will be estimated by the model ("kl" model) or whether the
#'  rates of these classes are fixed and pre-defined by the user ("mixkl"
#'  model). The default value is "predefined = TRUE".
#'
#'@param K The number of layers with the nested 1R model implemented in 2022,
#'  this represents thus the number of HBD classes. By default, K is set to 10
#'  but this is not optimal for all data sets. Hence, we recommend to the users
#'  to select their own value. If K is set to 1 and rates are estimated, RZooRoH
#'  will use the same rate for the HBD and the non-HBD class (so-called 1R
#'  model).
#'
#'@param mix_coef The starting value for the mixing coefficients in each layer.
#'  The mixing coefficients determine the frequency of the segments from
#'  different classes, they determine the probability to start a new HBD segment
#'  in a given layer (after reaching that layer after a coancestry change). The
#'  mixing coefficients can also be interpreted as the rate of inbreeding in
#'  their respective layer and should take values between 0 and 1. The function
#'  expects K mixing coefficients (e.g. the number of layers). The default
#'  values are 0.01 for HBD classes (smaller values may be better for large
#'  values of K). In case the parameters are not estimated (e.g. when running
#'  the forward-backward or the Viterbi algorithm alone), these are the mixing
#'  coefficients used by the RZooRoH model. Is it possible to force some mixing
#'  coefficient to take the same value with the step option. In that case,
#'  the expected number of mixing coefficients might be different.
#'
#'@param base_rate is a integer used to define the rates of successive layers or
#'  HBD classes (see krates below). This parameter is most useful when using a
#'  model with predefined rates. The rate of each HBD class will be equal to the
#'  base_rate raised to the exponent k (the class number). The non-HBD class
#'  will have the same rate as the last HBD class. For instance, with a
#'  base_rate of 2 and four layers, we have the following rates: 2, 4, 8, 16 and
#'  16. Similarly, with a base_rate of 10 and three layers, we have 10, 100,
#'  1000 and 1000. With this method, more HBD classes are defined for more
#'  recent ancestors (for which we have more information to estimate R) and less
#'  for ancient HBD classes (it doesn't make sense to try to distinguish R =
#'  1000 from R = 1010). In addition, since the expected length of HBD segments
#'  is expected to be 1/R, the ratio between successive expected HBD lengths
#'  remains the same. This ratio also determines the ability of the model to
#'  distinguish segments from distinct classes. By keeping the ratio constant,
#'  the aptitude to discriminate between HBD classes is also constant. In
#'  addition, this method allows to cover a wide range of generations in the
#'  past, with more emphasis on recent ancestors. The default value for the
#'  base_rate is 2.
#'
#'@param krates Is an array with a rate in each layer. The function expects K
#'  positive rates. These rates are parameters of the exponential distribution
#'  that together with the distance in centimorgans defines the probability to
#'  end a HBD segments between two markers. Each layer / HBD class has a
#'  distinct rate. Therefore, the expected length of HBD classes is defined by
#'  the rates. The expected length is equal to 1/R. These krates are associated
#'  with the age of the common ancestor of the HBD segment. The rate is
#'  approximately equal to the size of the inbreeding loop (twice the number of
#'  generations to the common ancestor) when the map is given in Morgans. This
#'  is not an exact dating but an approximation, in simplified conditions. By
#'  default, the rates are defined by the base_rate parameter (2, 4, 8, 16,
#'  ...).
#'
#'@param err Indicates the error term, the probability to observe an
#'  heterozygous genotype in a HBD segment. The genotype could be heterozygous
#'  due to a mutation occuring on the path to the common ancestor. It can also
#'  be associated with a genotype calling error or a technical error. In case GP
#'  or GL formats are used (with genotyped probabilities or phred scores) or
#'  when an AD format is used (based on read counts), this error term still
#'  represents the probability to observe an heterozygous genotype in a HBD
#'  segment. When an heterozygous genotype was called with a probability equal
#'  to 1.00, this heterozygosity in an HBD track might be associated to a
#'  mutation or to errors not accounted for by the model used to estimate the
#'  genotype probabilities (e.g., GATK). The emission probability to observe a
#'  heterozygous genotype in an HBD class will never go below the error term.
#'  The default value is 0.001.
#'
#'@param seqerr This parameters is used only with the AD format. In the AD
#'  format the user gives the number of reads for both alleles. A simple model
#'  is then used to estimate the genotype probabilities based on the read
#'  counts. In that model, the seqerr represents the probability to have a
#'  sequencing error in one read. The default value is 0.001.
#'
#'@param step Logical (TRUE or FALSE). This allows to use a step function to
#'  model the mixing coefficients. First a model with many layers is defined.
#'  For example, one layer for each even number (50 layers from 2 to 100 for
#'  example). Although such a model allows a finer resolution, it requires the
#'  estimation of many parameters (probably without enough information to
#'  estimate precisely all mixing coefficients). Therefore, we propose a
#'  stepfunction that forces mixing coefficients to be constant for several
#'  consecutive layers (by blocks of layers). When this option is used, K
#'  represents the total number of fitted layers, all their rates must be
#'  defined. Regarding the mixing coefficients, one mixing coefficient must be
#'  defined per step (block). An incidence matrix relating the full set of
#'  mixing coefficients from HBD classes from all layers to the steps or blocks
#'  must be provided.
#'
#'@param XM Is an incidence matrix relating the full set of mixing coefficients
#'  to the reduced set of mixing coefficients (corresponding to steps or
#'  blocks).
#'
#'@param HBDclass By default, this value is set to "SingleRate". This means that
#'  the layer is defined by a single rate (like an average or global rate for
#'  that layer). With the option "Interval", one rate is first defined for every
#'  past generation. If we assume discrete past generations (1, 2, 3, 4, 5,
#'  ...), the corresponding rates would approximately be equal to 2, 4, 6, ...
#'  We insist that this is a crude approximation. However, it allows easier
#'  interpretation of the results, for instance by understanding better how many
#'  generations are approximately captured per layer. One layer corresponds then
#'  to multiple "generations" and is defined as an interval going from
#'  generations g1 to g2. In this approach, we use only discrete generations and
#'  start at generation 1. The values in the krates vector correspond then to
#'  the last generation in each layer. For example, setting krates to
#'  (10,20,50,100) would define four layers from generations 1 to 10, 11 to 20,
#'  21 to 50 and 51 to 100. To obtain the corresponding rates, these values are
#'  multiplied by two (inside the function). The rates used by ZooRoH would then go
#'  from 2 to 20, 22 to 40, 42 to 100 and 102 to 200.
#'
#'@return The function return an object that defines a model for RZooRoH and
#'  including the following elements: zmodel@@typeModel equal to "kl", "mixkl" or
#'  "step_mixkl" according to the selected model, zmodel@@mix_coef an array with
#'  mixing coefficients, zmodel@@krates an array with the rates of the HBD
#'  classes, zmodel@@err the parameter defining the probability to observe an
#'  heterozygous genotype in an HBD class, and zmodel@@seqerr the parameter
#'  defining the probability of sequencing error per read, zmodel@@XM the
#'  incidence matrix relating individual layers to blocks or steps,
#'  zmodel@@typeClass indicating the type of HBD classes,
#'
#'@examples
#'
#'# To define a the default model, with 10 layers (10 HBD and 1 non-HBD class)
#'# and with pre-defined rates for HBD classes with a base of 2 (2, 4, 8, ...):
#'
#'Mix10L <- zoomodel()
#'
#'# To see the parameters of the defined model, just type:
#'
#'Mix10L
#'
#'# To define a model with four layers and using a base of 10 to define
#'# rates (10, 100, 1000, ...):
#'
#'Mix4L <- zoomodel(K=4,base=10)
#'
#'# To define a model with two classes, with estimation of rates for HBD classes
#'# and starting with a rate 10:
#'
#'my.mod1R <- zoomodel(predefined=FALSE,K=1,krates=c(10))
#'
#'
#'@export

zoomodel <- function(predefined = TRUE, K = 10, mix_coef = rep(0, K),
                         base_rate = 2, krates = rep(0, K), err = 0.001, seqerr = 0.001,
                         step = FALSE, XM = rep(0,K), HBDclass = "SingleRate") {

  mymod <- new("zmodel")
  mymod@typeModel = "kl"
  if(predefined){mymod@typeModel = "mixkl"}
  if(!predefined){mymod@typeModel = "kl"}
  if(predefined & step){mymod@typeModel = "step_mixkl"}

  if(!step){mymod@XM <- matrix(1,K,1)}
  if(step){mymod@XM = XM}
  mymod@typeClass = HBDclass
  if(HBDclass != "SingleRate" & HBDclass != "Interval"){stop(paste("Unknown HBDclass type ::",HBDclass,"\n",sep=""))}

  for (i in 1:length(mix_coef)){
    if(mix_coef[i]<0 | mix_coef[i]>1){
      print("Mixing coefficients should take values between 0 and 1 !")
      print("We will use default values")
      mix_coef=rep(0,K)
    }
  }

  ##### K = number of layers or number of classes
  if(!step){
    if(length(mix_coef) != K){print("Length of mix_coeff does not match K !")}
    if(sum(mix_coef) == 0){
      mymod@mix_coef <- c(rep(0.01,(K)))
    } else {
      mymod@mix_coef = mix_coef
    }
  }

    if(sum(krates == 0)){
      mymod@krates <- array(0,K)
      for (i in 1:K){mymod@krates[i] = base_rate**i}
    } else {
      mymod@krates = krates
      if(length(krates[krates < 0])){print("Problem: negative rates for HBD classes !!!")}
      if(length(krates) != K){print("Length of krates does not match K !")}
    }


  ##### K = number of layers, the number of mixing coefficients should be equal to ncol(XM)
  if(step){
    if(sum(XM)==0){print("You must provide a design matrix with the step option !!!")}
    if(sum(XM)!=0){
      K2=ncol(XM)
      if(length(mix_coef)!=K2){
        print("The number of mixing coefficients must match the number of column from the design matrix")}
      if(nrow(XM)!=K){
        print("The number of rows from the design matrix must match the number of layers")}
    }
    if(sum(mix_coef) == 0){
      mymod@mix_coef <- c(rep(0.01,(K)))
    } else {
      mymod@mix_coef = mix_coef
    }
  }

  mymod@err = err
  mymod@seqerr = seqerr

  return(mymod)
}

Try the RZooRoH package in your browser

Any scripts or data that you put into this service are public.

RZooRoH documentation built on June 8, 2025, 9:32 p.m.