Nothing
setClass(Class = "zmodel",
representation(typeModel = "character", mix_coef = "vector", krates = "vector",
err = "numeric", seqerr = "numeric", XM = "matrix", typeClass ="character")
)
is.zmodel <- function (x)
{
res <- (is(x,"zmodel") & validObject(x))
return(res)
}
#'Define the model for the RZooRoH
#'
#'Help the user to create a model for RZooRoH, including default parameters. The
#'output is a zmodel object necessary to run RZooRoH.
#'
#'@param predefined Logical (TRUE or FALSE) to define whether rates of HBD and
#' non-HBD-classes will be estimated by the model ("kl" model) or whether the
#' rates of these classes are fixed and pre-defined by the user ("mixkl"
#' model). The default value is "predefined = TRUE".
#'
#'@param K The number of layers with the nested 1R model implemented in 2022,
#' this represents thus the number of HBD classes. By default, K is set to 10
#' but this is not optimal for all data sets. Hence, we recommend to the users
#' to select their own value. If K is set to 1 and rates are estimated, RZooRoH
#' will use the same rate for the HBD and the non-HBD class (so-called 1R
#' model).
#'
#'@param mix_coef The starting value for the mixing coefficients in each layer.
#' The mixing coefficients determine the frequency of the segments from
#' different classes, they determine the probability to start a new HBD segment
#' in a given layer (after reaching that layer after a coancestry change). The
#' mixing coefficients can also be interpreted as the rate of inbreeding in
#' their respective layer and should take values between 0 and 1. The function
#' expects K mixing coefficients (e.g. the number of layers). The default
#' values are 0.01 for HBD classes (smaller values may be better for large
#' values of K). In case the parameters are not estimated (e.g. when running
#' the forward-backward or the Viterbi algorithm alone), these are the mixing
#' coefficients used by the RZooRoH model. Is it possible to force some mixing
#' coefficient to take the same value with the step option. In that case,
#' the expected number of mixing coefficients might be different.
#'
#'@param base_rate is a integer used to define the rates of successive layers or
#' HBD classes (see krates below). This parameter is most useful when using a
#' model with predefined rates. The rate of each HBD class will be equal to the
#' base_rate raised to the exponent k (the class number). The non-HBD class
#' will have the same rate as the last HBD class. For instance, with a
#' base_rate of 2 and four layers, we have the following rates: 2, 4, 8, 16 and
#' 16. Similarly, with a base_rate of 10 and three layers, we have 10, 100,
#' 1000 and 1000. With this method, more HBD classes are defined for more
#' recent ancestors (for which we have more information to estimate R) and less
#' for ancient HBD classes (it doesn't make sense to try to distinguish R =
#' 1000 from R = 1010). In addition, since the expected length of HBD segments
#' is expected to be 1/R, the ratio between successive expected HBD lengths
#' remains the same. This ratio also determines the ability of the model to
#' distinguish segments from distinct classes. By keeping the ratio constant,
#' the aptitude to discriminate between HBD classes is also constant. In
#' addition, this method allows to cover a wide range of generations in the
#' past, with more emphasis on recent ancestors. The default value for the
#' base_rate is 2.
#'
#'@param krates Is an array with a rate in each layer. The function expects K
#' positive rates. These rates are parameters of the exponential distribution
#' that together with the distance in centimorgans defines the probability to
#' end a HBD segments between two markers. Each layer / HBD class has a
#' distinct rate. Therefore, the expected length of HBD classes is defined by
#' the rates. The expected length is equal to 1/R. These krates are associated
#' with the age of the common ancestor of the HBD segment. The rate is
#' approximately equal to the size of the inbreeding loop (twice the number of
#' generations to the common ancestor) when the map is given in Morgans. This
#' is not an exact dating but an approximation, in simplified conditions. By
#' default, the rates are defined by the base_rate parameter (2, 4, 8, 16,
#' ...).
#'
#'@param err Indicates the error term, the probability to observe an
#' heterozygous genotype in a HBD segment. The genotype could be heterozygous
#' due to a mutation occuring on the path to the common ancestor. It can also
#' be associated with a genotype calling error or a technical error. In case GP
#' or GL formats are used (with genotyped probabilities or phred scores) or
#' when an AD format is used (based on read counts), this error term still
#' represents the probability to observe an heterozygous genotype in a HBD
#' segment. When an heterozygous genotype was called with a probability equal
#' to 1.00, this heterozygosity in an HBD track might be associated to a
#' mutation or to errors not accounted for by the model used to estimate the
#' genotype probabilities (e.g., GATK). The emission probability to observe a
#' heterozygous genotype in an HBD class will never go below the error term.
#' The default value is 0.001.
#'
#'@param seqerr This parameters is used only with the AD format. In the AD
#' format the user gives the number of reads for both alleles. A simple model
#' is then used to estimate the genotype probabilities based on the read
#' counts. In that model, the seqerr represents the probability to have a
#' sequencing error in one read. The default value is 0.001.
#'
#'@param step Logical (TRUE or FALSE). This allows to use a step function to
#' model the mixing coefficients. First a model with many layers is defined.
#' For example, one layer for each even number (50 layers from 2 to 100 for
#' example). Although such a model allows a finer resolution, it requires the
#' estimation of many parameters (probably without enough information to
#' estimate precisely all mixing coefficients). Therefore, we propose a
#' stepfunction that forces mixing coefficients to be constant for several
#' consecutive layers (by blocks of layers). When this option is used, K
#' represents the total number of fitted layers, all their rates must be
#' defined. Regarding the mixing coefficients, one mixing coefficient must be
#' defined per step (block). An incidence matrix relating the full set of
#' mixing coefficients from HBD classes from all layers to the steps or blocks
#' must be provided.
#'
#'@param XM Is an incidence matrix relating the full set of mixing coefficients
#' to the reduced set of mixing coefficients (corresponding to steps or
#' blocks).
#'
#'@param HBDclass By default, this value is set to "SingleRate". This means that
#' the layer is defined by a single rate (like an average or global rate for
#' that layer). With the option "Interval", one rate is first defined for every
#' past generation. If we assume discrete past generations (1, 2, 3, 4, 5,
#' ...), the corresponding rates would approximately be equal to 2, 4, 6, ...
#' We insist that this is a crude approximation. However, it allows easier
#' interpretation of the results, for instance by understanding better how many
#' generations are approximately captured per layer. One layer corresponds then
#' to multiple "generations" and is defined as an interval going from
#' generations g1 to g2. In this approach, we use only discrete generations and
#' start at generation 1. The values in the krates vector correspond then to
#' the last generation in each layer. For example, setting krates to
#' (10,20,50,100) would define four layers from generations 1 to 10, 11 to 20,
#' 21 to 50 and 51 to 100. To obtain the corresponding rates, these values are
#' multiplied by two (inside the function). The rates used by ZooRoH would then go
#' from 2 to 20, 22 to 40, 42 to 100 and 102 to 200.
#'
#'@return The function return an object that defines a model for RZooRoH and
#' including the following elements: zmodel@@typeModel equal to "kl", "mixkl" or
#' "step_mixkl" according to the selected model, zmodel@@mix_coef an array with
#' mixing coefficients, zmodel@@krates an array with the rates of the HBD
#' classes, zmodel@@err the parameter defining the probability to observe an
#' heterozygous genotype in an HBD class, and zmodel@@seqerr the parameter
#' defining the probability of sequencing error per read, zmodel@@XM the
#' incidence matrix relating individual layers to blocks or steps,
#' zmodel@@typeClass indicating the type of HBD classes,
#'
#'@examples
#'
#'# To define a the default model, with 10 layers (10 HBD and 1 non-HBD class)
#'# and with pre-defined rates for HBD classes with a base of 2 (2, 4, 8, ...):
#'
#'Mix10L <- zoomodel()
#'
#'# To see the parameters of the defined model, just type:
#'
#'Mix10L
#'
#'# To define a model with four layers and using a base of 10 to define
#'# rates (10, 100, 1000, ...):
#'
#'Mix4L <- zoomodel(K=4,base=10)
#'
#'# To define a model with two classes, with estimation of rates for HBD classes
#'# and starting with a rate 10:
#'
#'my.mod1R <- zoomodel(predefined=FALSE,K=1,krates=c(10))
#'
#'
#'@export
zoomodel <- function(predefined = TRUE, K = 10, mix_coef = rep(0, K),
base_rate = 2, krates = rep(0, K), err = 0.001, seqerr = 0.001,
step = FALSE, XM = rep(0,K), HBDclass = "SingleRate") {
mymod <- new("zmodel")
mymod@typeModel = "kl"
if(predefined){mymod@typeModel = "mixkl"}
if(!predefined){mymod@typeModel = "kl"}
if(predefined & step){mymod@typeModel = "step_mixkl"}
if(!step){mymod@XM <- matrix(1,K,1)}
if(step){mymod@XM = XM}
mymod@typeClass = HBDclass
if(HBDclass != "SingleRate" & HBDclass != "Interval"){stop(paste("Unknown HBDclass type ::",HBDclass,"\n",sep=""))}
for (i in 1:length(mix_coef)){
if(mix_coef[i]<0 | mix_coef[i]>1){
print("Mixing coefficients should take values between 0 and 1 !")
print("We will use default values")
mix_coef=rep(0,K)
}
}
##### K = number of layers or number of classes
if(!step){
if(length(mix_coef) != K){print("Length of mix_coeff does not match K !")}
if(sum(mix_coef) == 0){
mymod@mix_coef <- c(rep(0.01,(K)))
} else {
mymod@mix_coef = mix_coef
}
}
if(sum(krates == 0)){
mymod@krates <- array(0,K)
for (i in 1:K){mymod@krates[i] = base_rate**i}
} else {
mymod@krates = krates
if(length(krates[krates < 0])){print("Problem: negative rates for HBD classes !!!")}
if(length(krates) != K){print("Length of krates does not match K !")}
}
##### K = number of layers, the number of mixing coefficients should be equal to ncol(XM)
if(step){
if(sum(XM)==0){print("You must provide a design matrix with the step option !!!")}
if(sum(XM)!=0){
K2=ncol(XM)
if(length(mix_coef)!=K2){
print("The number of mixing coefficients must match the number of column from the design matrix")}
if(nrow(XM)!=K){
print("The number of rows from the design matrix must match the number of layers")}
}
if(sum(mix_coef) == 0){
mymod@mix_coef <- c(rep(0.01,(K)))
} else {
mymod@mix_coef = mix_coef
}
}
mymod@err = err
mymod@seqerr = seqerr
return(mymod)
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.