R/distribution.R

Defines functions distribution

Documented in distribution

#' @title Plot the distribution of Colless-Like, Sackin and cophenetic normalized balance indices under the alpha-gamma model and computes the percentile of a tree from previous distributions.
#' 
#' @description Given alpha, gamma and a phylogenetic tree, plot the distribution of the Colless-Like, Sackin and cophenetic normalized balance indices under the alpha-gamma model and computes the percentile of that tree of the previous normalized balance indices under the alpha-gamma model.
#' 
#' @param tree a single phylogenetic tree. It can be entered as a string in Newick format, as a "phylo" object (\code{ape} package) or as an "igraph" object (\code{igraph} package). 
#' @param alpha parametrer of the alpha-gamma model, between 0 and 1.
#' @param gamma parametrer of the alpha-gamma model, between 0 and alpha.
#' @param set.indices If \code{NULL} (default) the values of the balance indices are taken from stored data or from a new simulated data (See "Details"). If not, it must be a 3-column data.frame with the three balance indices (Colles-like, Sackin, Cophenetic). See \code{\link{indices.simulation}}.
#' @param new.simulation if \code{FALSE}(default) the values of the balance indices are taken from a data.frame entered by the user or from our database. If \code{TRUE}, the values of the balance indices are computed from a new simulation. See \code{\link{indices.simulation}}.
#' @param repetitions if the value of the new.simulation parameter is \code{TRUE}, the number of trees to be generated.
#' @param legend.location location of the legend. See "Details".
#' @param cex expansion factor of the legend. See "Details".
#' @param percentile.plot if \code{FALSE} (default),  density plots of the normalized balance indices are shown. if \code{TRUE}, percentiles plots of the normalized balance indices are shown.
#' @param db.path the current working directory. If our database is used, the db.path parameter should be the directory where the database is located.
#'
#' @details Two plots are available: one represents the percentile plots of the normalized balance indices (\code{percentile.plot=TRUE}), and the other one represents the density plots of the normalized balance indices (\code{percentile.plot=FALSE}).
#' 
#' The trees stored in our database have between 3 and 50 leaves and the values of the parameters \code{alpha} and \code{gamma} are in \{0,0.1,...,1\} such that {\code{gamma} \eqn{\le} \code{alpha}}. If the introduced parameters are not in the list, a new computation is done with them and a new dataset of trees is generated, and their indices are also computed. The number of trees generated can be modified by the parameter \code{repetitions} (see \code{\link{indices.simulation}} for more information). This computation may take some time, therefore you can computate them separately with \code{\link{indices.simulation}}, save their values and then call this function  by setting the parameter \code{set.indices=NULL}.
#' 
#' Our database is available to download at \url{https://github.com/LuciaRotger/CollessLike/tree/master/CollessLikeDataBase}.
#' 
#' The legend is placed with the \code{graphics} function \code{legend()}, so its location can be specified by setting \code{legend.position} to a single keyword from the list \code{"bottomright"}, \code{"bottom"}, \code{"bottomleft"}, \code{"left"}, \code{"topleft"}, \code{"top"}, \code{"topright"}, \code{"right"} and \code{"center"}. 
#' The expansion factor for the legend is controlled by the parameter \code{cex}, by default \code{cex=1}. See \code{\link[graphics]{legend}}.
#'
#' @return A numeric vector with the three percentiles.
#' 
#' @references  
#' B. Chen, D. Ford, M. Winkel, A new family of Markov branching trees: the alpha-gamma model. \emph{Electr. J. Probab}. \bold{14} (2009), 400-430. 
#' 
#' A. Mir, F. Rossello, L.Rotger, Sound Colless-like balance indices for multifurcating phylogenetic trees.\emph{PloS ONE} 13 (2018), e0203401.
#' 
#' A. Mir, F. Rossello, L.Rotger, A new balance index for phylogenetic trees. \emph{Mathematical Biosciences} \bold{241} (2013), 125-136.
#' 
#' M. J. Sackin, "Good" and  "bad" phenograms. \emph{Sys. Zool}, \bold{21} (1972), 225-226.
#'  
#' 
#' @seealso \code{\link[graphics]{legend}}, \code{\link{indices.simulation}}, \code{\link{balance.indices}}
#' 
#' @examples 
#' #The parameter folder contains the location of the database
#' #If not specified folder=getwd()
#' 
#' ## Different ways to introduce the tree
#' #From a newick string
#' \donttest{distribution("(1,2,3,4,5);",0.5,0.3,db.path=folder)}
#' \donttest{distribution("(1,(2,(3,(4,5))));",0.5,0.3,db.path=folder)}
#' 
#' #From a phylo object
#' \donttest{require(ape)}
#' \donttest{random.tree = rtree(5,rooted=TRUE)}
#' \donttest{distribution(random.tree,0.5,0.3,db.path=folder)}
#' 
#' #An example of a tree generated by the alpha-gamma model (igraph object)
#' \donttest{a.g.tree = a.g.model(5,0.5,0.3)}
#' \donttest{distribution(a.g.tree,0.5,0.3,db.path=folder)}
#' 
#' ## Different indices data
#' # From our data base
#' \donttest{distribution(a.g.tree,0.5,0.3,db.path=folder)}
#' 
#' # From a data.frame generated by 'indices.simulation'
#' # ('Repetitions' set as 10 for a fast example)
#' \donttest{indices.data = indices.simulation(5,0.5,0.3,10)}
#' \donttest{distribution(a.g.tree,0.5,0.3,set.indices=indices.data) }
#' 
#' # Allow the function to do a new generation of data and compute their indices
#' \donttest{distribution(a.g.tree,0.5,0.3,new.simulation=TRUE,repetitions=10)}
#' # WARNING! it might take a long time, it depends on the parameters 
#' # 'n' (number of leaves) and 'repetition' (number of repetitions)
#' 
#' @importFrom ape read.tree
#' @importFrom igraph graph.edgelist degree
#' @importFrom graphics grid legend  lines mtext par plot points polygon text
#' @importFrom grDevices rgb 
#' @importFrom stats approxfun density  
#' @importFrom utils read.table 
#' 
#' @author Lucia Rotger
#' 
#' @export
distribution <-
  function(tree,alpha=NA,gamma=NA,set.indices=NULL,new.simulation=FALSE,repetitions=1000,
           legend.location="topright",cex=0.75,percentile.plot=FALSE,db.path=getwd() ){
    ## Class of object "tree"
    if(class(tree)=="character") 
      tree=read.tree(text = tree)
    if (class(tree)=="phylo") 
      tree=graph.edgelist(tree$edge, directed=TRUE)  
    if(class(tree)!="igraph")
      stop("Not an igraph object. Please introduce a newick string, an ape tree or an igraph tree.")
    n = sum(degree(tree,mode="out")==0)
    ## parameters alpha & gamma  
    if(new.simulation){
      print("This process might take a long time. If you want to save the indices 
            simulation, please run 'indices.simulation' directly and then call 'distribution' by setting
            the resulting table as the parameter 'set.indices'")
      print("Remember, our indices data base is available to download at: https://github.com/LuciaRotger/CollessLike/tree/master/CollessLikeDataBase")
      warning("New simulation required")
      indices.list=indices.simulation(n,alpha,gamma,repetitions)
      #txt = paste("Parameters: alpha=",alpha,", gamma=",gamma,"",sep="")
      txt = bquote(paste("Parameters: ",alpha," = ",.(alpha),", ",gamma," = ",.(gamma)))
    }
    else{
      if(is.null(set.indices)){  
        if(alpha<gamma){
          print("Remember, our indices data base is available to download at: https://github.com/LuciaRotger/CollessLike/tree/master/CollessLikeDataBase")
          stop("alpha < gamma")
        }
        else{
          if((alpha>1)||(alpha<0)||(gamma>1)||(gamma<0)){
            print("Remember, our indices data base is available to download at: https://github.com/LuciaRotger/CollessLike/tree/master/CollessLikeDataBase")
            stop("alpha and gamma must been between 0 and 1")
          }
          else{
            #txt = paste("Parameters: n=",n,", alpha=",alpha,", gamma=",gamma,"",sep="")
            txt = bquote(paste("Parameters: n = ",.(n),", ",alpha," = ",.(alpha),", ",gamma," = ",.(gamma)))
            if(paste("n",n,sep="")%in%dir(db.path)){ 
              file = paste("CollessLikeDataBase_n",n,"_a",alpha*100,"_g",gamma*100,"_r5000.txt",sep="")
              folder = paste(db.path,"n",n,"/",sep="")
              if(file %in% dir(folder)){ 
                indices.list=read.table(file=paste(folder,file,sep=""), header=TRUE)
              }
              else {
                print("Remember, our indices data base is available to download at: https://github.com/LuciaRotger/CollessLike/tree/master/CollessLikeDataBase")
                stop(paste("The file '",file,"' is not located at '",folder,"'",sep=""))
              }
            }
            else {
              print("Remember, our indices data base is available to download at: https://github.com/LuciaRotger/CollessLike/tree/master/CollessLikeDataBase")
              stop(paste("The folder 'n",n,"' is not located at '",db.path,"'",sep=""))
            }
          }
        }
      }
      else{
        print("Indices Database introduced by user")
        txt=""
        indices.list = set.indices
      }
    }
    ########################################
    if(max(indices.list)>1){
      ## maximum
      max.cl = ( log(0+exp(1)) + log(2+exp(1)) )*(n-1)*(n-2)/4
      max.s = n*(n-1)/2 + n-1
      max.c = n*(n-1)*(n-2)/6
      indices.list[,1] = round(indices.list[,1]/max.cl,4)
      indices.list[,2] = round((indices.list[,2]-n)/(max.s-n),4)
      indices.list[,3] = round(indices.list[,3]/max.c,4)
    } 
    # densities
    d.cl= density(indices.list[,1]) ##
    d.s = density(indices.list[,2])   ##
    d.c = density(indices.list[,3])   ##
    xlim = range(c(0,1))
    ylim = range(c(0,d.cl$y,d.s$y,d.c$y))
    
    #tree
    tree.indices = balance.indices(tree)
    tree.indices = round(c(tree.indices[1]/max.cl,
                           (tree.indices[2]-n)/(max.s-n),
                           tree.indices[3]/max.c),4)
    
    f.cl=approxfun(d.cl$x,d.cl$y)
    f.s=approxfun(d.s$x,d.s$y)
    f.c=approxfun(d.c$x,d.c$y)
    tree.densities = round(c(f.cl(tree.indices[1]),f.s(tree.indices[2]),
                             f.c(tree.indices[3])),4)
    tree.densities[is.na(tree.densities)]=0  
    
    a.cl = cumsum(d.cl$y)
    a.cl=a.cl/max(a.cl) 
    a.s= cumsum(d.s$y)
    a.s= a.s/max(a.s) 
    a.c= cumsum(d.c$y)
    a.c= a.c/max(a.c)
    #tree index plots percs
    percs = c(a.cl[which(d.cl$x/max(d.cl$x)>tree.indices[1])[1]],
              + a.s[which(d.s$x/max(d.s$x)>tree.indices[2])[1]],
              + a.c[which(d.c$x/max(d.c$x)>tree.indices[3])[1]])
    percs[is.na(percs)]=1
    percs = round(percs,4)
    
    print(paste("Tree with n=",n," leaves",sep=""))
    print(paste("Colles-like: ",tree.indices[1],
                " (density:", tree.densities[1] ,"), Percentile:",
                percs[1] ,sep=""))
    print(paste("Sackin: ",tree.indices[2],
                " (density:", tree.densities[2] ,"), Percentile:",
                percs[2] ,sep=""))
    print(paste("Cophenetic: ",tree.indices[3],
                " (density:", tree.densities[3] ,"), Percentile:",
                percs[3],sep="")) 
    
    #plots
    par(xpd=FALSE) 
    
    if(!percentile.plot){
      plot(-1,-1 , xlab = "", ylab="Distribution of indices",
           xlim = xlim, ylim = ylim,xaxs = 'i', yaxs='i', 
           main = 'Distribution of indices', panel.first = grid() )
      
      polygon(d.cl, density = -1, col=rgb(1,0,0,0.2),border = "red",lwd = 1)
      polygon(d.s, density = -1, col=rgb(0,1,0,0.2),border = "green",lwd = 1)
      polygon(d.c, density = -1, col=rgb(0,0,1,0.2),border = "blue",lwd = 1)
      legend(legend.location,c("Colles-Like","Sackin","Cophenetic"),
             fill = c(rgb(1,0,0,0.2),rgb(0,1,0,0.2),rgb(0,0,1,0.2),
                      bty ='n',border = NA),cex=cex)
       
      lines(rep(tree.indices[1],2),c(0,tree.densities[1]),col=rgb(1,0,0),lwd =2)
      lines(rep(tree.indices[2],2),c(0,tree.densities[2]),col=rgb(0,1,0),lwd =2)
      lines(rep(tree.indices[3],2),c(0,tree.densities[3]),col=rgb(0,0,1),lwd =2)
      
      lines(xlim,c(0,0),lwd=2)
      
      points(tree.indices[1],tree.densities[1],pch=21,bg=rgb(1,0,0))
      points(tree.indices[2],tree.densities[2],pch=21,bg=rgb(0,1,0))
      points(tree.indices[3],tree.densities[3],pch=21,bg=rgb(0,0,1))
    }
    else{  
      plot(-1,-1 , xlab = "", ylab="Percentiles",
           xlim = xlim, ylim = c(0,1),xaxs = 'i', yaxs='i', 
           main = 'Percentile Plot', panel.first = grid() )   
      
      lines(d.cl$x/max(d.cl$x),a.cl, col=rgb(1,0,0))
      lines( d.s$x/max(d.s$x), a.s, col=rgb(0,1,0))
      lines( d.c$x/max(d.c$x), a.c, col=rgb(0,0,1))
      
      legend("topleft",c("Colles-Like","Sackin","Cophenetic"),
             fill = c(rgb(1,0,0,0.2),rgb(0,1,0,0.2),rgb(0,0,1,0.2),
                      bty ='n',border = NA),cex=cex)
       
      lines(rep(tree.indices[1],2),c(0,percs[1]),col=rgb(1,0,0),lwd =2)
      lines(rep(tree.indices[2],2),c(0,percs[2]),col=rgb(0,1,0),lwd =2)
      lines(rep(tree.indices[3],2),c(0,percs[3]),col=rgb(0,0,1),lwd =2)
      
      lines(xlim,c(0,0),lwd=1)
      
      points(tree.indices[1],percs[1],pch=21,bg=rgb(1,0,0))
      points(tree.indices[2],percs[2],pch=21,bg=rgb(0,1,0))
      points(tree.indices[3],percs[3],pch=21,bg=rgb(0,0,1))
      
    }
    mtext( txt , line = 0.3) 
    mtext("Normalized indices",line = 2.5,side = 1)
    mtext(bquote(paste("Percentiles: ",P[C],"=",.(percs[1]),
                       ", ", P[S],"=",.(percs[2]),", ",P[Phi],"=",.(percs[3]) )),
          line = 4,side = 1)
    return(percs)
  }
LuciaRG/CollessLike documentation built on Oct. 12, 2019, 11:19 a.m.