R/BTdecayLasso.R

Defines functions BTdecayLasso

Documented in BTdecayLasso

#' Bradley-Terry Model with Exponential Decayed weighted likelihood and Adaptive Lasso
#' 
#' @description 
#' Bradley-Terry model is applied for paired comparison data. Teams' ability score is estimated by maximizing log-likelihood function.
#' 
#' To achieve a better track of current abilities, we apply an exponential decay rate to weight the log-likelihood function.
#' The most current matches will weight more than previous matches. Parameter "decay.rate" in most functions of this package is used
#' to set the amount of exponential decay rate. decay.rate should be non-negative and the appropriate range of it depends on time scale in original dataframe.
#' (see \code{\link{BTdataframe}} and parameter "dataframe"'s definition of fifth column) For example,
#' a unit of week with a "decay.rate" 0.007 is equivalent to the unit of day with "decay.rate" 0.001. Usually, for sports matches,
#' if we take the unit of day, it's ranging from 0 to 0.01. The higher choice of "decay.rate", the better track of current teams' ability
#' with a side effect of higher variance.
#' 
#' If "decay.rate" is too large, for example "0.1" with a unit of day, \eqn{\exp(-0.7)} = 0.50. Only half weight will be add to the likelihood for matches played
#' one week ago and \eqn{\exp(-3.1)} = 0.05 suggests that previous matches took place one month ago will have little effect. Therefore, Only a few matches are
#' accounted for ability's estimation. It will lead to a very high variance and uncertainty. Since standard Bradley-Terry model
#' can not handle the case where there is a team who wins or loses all matches, such estimation may not provide convergent results. 
#' Thus, if our estimation provides divergent result, an error will be returned and we suggest user to chose a smaller "decay.rate"
#' or adding more match results into the same modeling period.
#' 
#' By default, the Adaptive Lasso is implemented for variance reduction and team's grouping. Adaptive Lasso is proved to have good grouping property.
#' Apart from adaptive lasso, user can define own weight for different
#' Lasso constraint \eqn{\left|\mu_{i}-\mu_{j}\right|} where \eqn{\mu_{i}} is team i's ability.
#' 
#' Also by default, the whole Lasso path will be run. Similar to package "glmnet", user can provide their own choice of Lasso penalty "lambda" and determine whether the
#' whole Lasso path will be run (since such run is time-consuming). However, we suggest that if user is not familiar with the actual relationship among
#' lambda, the amount of penalty, the amount of shrinkage and grouping effect, a whole Lasso path should be run and selection of an
#' appropriate lambda is done by AIC or BIC criteria using \code{\link{BTdecayLassoC}} (since this model is time related, cross-validation method cannot be applied). Also, users can
#' use \code{\link{BTdecayLassoF}} to run with a specific Lasso penalty ranging from 0 to 1 (1 penalty means all estimators will shrink to 0).
#' 
#' Two sets of estimated abilities will be given, the biased Lasso estimation and the HYBRID Lasso's estimation.
#' HYBRID Lasso estimation solves the restricted Maximum Likelihood optimization based on the group determined by Lasso's estimation (Different team's ability will converges to
#' the same value if Lasso penalty is added and these teams' ability is setting to be equal as a restriction).
#' 
#' In addition, summary() using S3 method can be applied to view the outputs.
#' 
#' @param dataframe Generated using \code{\link{BTdataframe}} given raw data.
#' @param ability A column vector of teams ability, the last row is the home parameter.
#' The row number is consistent with the team's index shown in dataframe. It can be generated using \code{\link{BTdataframe}} given raw data.
#' @param lambda The amount of Lasso penalty induced. The input should be a positive scalar or a sequence.
#' @param weight Weight for Lasso penalty on different abilities.
#' @param path whether the whole Lasso path will be run (plot.BTdecayLasso is enabled only if path = TRUE)
#' @param decay.rate A non-negative exponential decay rate. Usually ranging from (0, 0.01), A larger decay rate weights more
#' importance to most recent matches and the estimated parameters reflect more on recent behaviour.
#' @param fixed A teams index whose ability will be fixed as 0. The worstTeam's index
#' can be generated using \code{\link{BTdataframe}} given raw data.
#' @param thersh Threshold for convergence used for Augmented Lagrangian Method.
#' @param max Maximum weight for \eqn{w_{ij}} (weight used for Adaptive Lasso)
#' @param iter Number of iterations used in L-BFGS-B algorithm.
#' @details
#' According to \code{\link{BTdecay}}, the objective likelihood function to be optimized is,
#' \deqn{\sum_{k=1}^{n}\sum_{i<j}\exp(-\alpha t_{k})\cdot(y_{ij}(\tau h_{ij}^{t_{k}}+\mu_{i}-\mu_{j})-\log(1+\exp(\tau h_{ij}^{t_{k}}+\mu_{i}-\mu_{j})))}
#' The Lasso constraint is given as,
#' \deqn{\sum_{i<j}w_{ij}\left|\mu_{i}-\mu_{j}\right|\leq s}
#' where \eqn{w_{ij}} are predefined weight. For Adaptive Lasso, \eqn{\left|w_{ij}=1/(\mu_{i}^{MLE}-\mu_{j}^{MLE})\right|}.
#' 
#' Maximize this constraint objective function is equivalent to minimizing the following equation,
#' \deqn{-l(\mu,\tau)+\lambda\sum_{i<j}w_{ij}|\mu_{i}-\mu_{j}|}
#' Where \eqn{-l(\mu,\tau)} is taking negative value of objective function above.  Increase "lambda" will decrease "s", their relationship is
#' monotone. Here, we define "penalty" as \eqn{1-s/\max(s)}. Thus, "lambda" and "penalty" has a positive correlation.
#' @return
#' \item{ability}{Estimated ability scores with user given lambda}
#' \item{likelihood}{Negative likelihood of objective function with user given lambda}
#' \item{df}{Degree of freedom with user given lambda(number of distinct \eqn{\mu})}
#' \item{penalty}{\eqn{s/max(s)} with user given lambda}
#' \item{Lambda}{User given lambda}
#' \item{ability.path}{if path = TRUE, estimated ability scores on whole Lasso path}
#' \item{likelihood.path}{if path = TRUE, negative likelihood of objective function on whole Lasso path}
#' \item{df.path}{if path = TRUE, degree of freedom on whole Lasso path(number of distinct \eqn{\mu})}
#' \item{penalty.path}{if path = TRUE, \eqn{s/max(s)} on whole Lasso path}
#' \item{Lambda.path}{if path = TRUE, Whole Lasso path}
#' \item{path}{Whether whole Lasso path will be run}
#' \item{HYBRID.ability.path}{If path = TRUE, the whole path of evolving of HYBRID ability}
#' \item{HYBRID.likelihood.path}{if path = TRUE, the whole path of HYBRID likelihood}
#' @seealso \code{\link{BTdataframe}} for dataframe initialization,
#' \code{\link{plot.swlasso}},  \code{\link{plot.wlasso}} are used for Lasso path plot if path = TRUE in this function's run
#' @references 
#' Masarotto, G. and Varin, C.(2012) The Ranking Lasso and its Application to Sport Tournaments. 
#' *The Annals of Applied Statistics* **6** 1949--1970.
#' 
#' Zou, H. (2006) The adaptive lasso and its oracle properties. 
#' *J.Amer.Statist.Assoc* **101** 1418--1429.
#' @examples
#' ##Initializing Dataframe
#' x <- BTdataframe(NFL2010)
#' 
#' ##The following code runs the main results
#' ##Usually a single lambda's run will take 1-20 s
#' ##The whole Adaptive Lasso run will take 5-20 min
#' \donttest{
#' ##BTdecayLasso run with exponential decay rate 0.005 and 
#' ##lambda 0.1, use path = TRUE if you want to run whole LASSO path
#' y1 <- BTdecayLasso(x$dataframe, x$ability, lambda = 0.1, path = FALSE,
#'                    decay.rate = 0.005, fixed = x$worstTeam)
#' summary(y1)
#' 
#' ##Defining equal weight
#' ##Note that comparing to Adaptive weight, the user defined weight may not be 
#' ##efficient in groupiing. Therefore, to run the whole Lasso path 
#' ##(evolving of distinct ability scores), it may take a much longer time. 
#' ##We recommend the user to apply the default setting,
#' ##where Adaptive Lasso will be run.
#' 
#' n <- nrow(x$ability) - 1
#' w2 <- matrix(1, nrow = n, ncol = n)
#' w2[lower.tri(w2, diag = TRUE)] <- 0
#' 
#' ##BTdecayLasso run with exponential decay rate 0.005 and with a specific lambda 0.1
#' y2 <- BTdecayLasso(x$dataframe, x$ability, lambda = 0.1, weight = w2, 
#'                    path = FALSE, decay.rate = 0.005, fixed = x$worstTeam)
#' 
#' summary(y2)
#' }
#' 
#' @export

BTdecayLasso <- function(dataframe, ability, lambda = NULL, weight = NULL, path = TRUE, decay.rate = 0, fixed = 1, thersh = 1e-5, max = 100, iter = 100) {
  
  
  u <- decay.rate
  n <- nrow(ability) - 1
  theta <- matrix(0, nrow = n, ncol = n) 
  Lagrangian <- matrix(0, nrow = n, ncol = n) 
  ability[, 1] <- 0
  k0 <- 0.0675
  
  if(!(fixed %in% seq(1, n, 1))){
    stop("The fixed team's index must be an integer index of one of all teams")
  }
  
  if (is.null(weight)) {
    weight <- BTLasso.weight(dataframe, ability, decay.rate = decay.rate, fixed = fixed, thersh = thersh, max = max, iter = iter)
  }
  
  v <- 1 
  ability0 <- ability[, -1]
  Hability0 <- ability[, -1]
  l <- c()
  hl <- c()
  p <- c()
  slambda <- c()
  
  BT <- BTdecay(dataframe, ability, decay.rate = decay.rate, fixed = fixed, iter = iter)
  ability1 <- BT$ability
  s1 <- penaltyAmount(ability1, weight)
  l1 <- BTLikelihood(dataframe, ability1, decay.rate = decay.rate)
  df <- c()
  j <- 1
  
  if (path == FALSE && is.null(lambda)) {
    stop("Please provide a sequence of lambda or enable lasso path")
  }
  
  if (path == TRUE) {
    lambda0 <- 1
    lambda1 <- exp(-0.5)
    degree0 <- 0
    
    while (degree0 < (n - 1)) {
      stop <- 0
      while (stop==0) {
        ##ability <- BTdecayLasso.step1(dataframe, ability, weight, Lagrangian, theta, v, lambda1, 
        ##                              decay.rate = decay.rate, fixed = fixed, thersh = thersh, iter = iter)
        ability <- BTdecay.Qua(dataframe, ability, theta, v, Lagrangian, decay.rate = decay.rate,
                               fixed = fixed, iter = iter)
        theta <- BTtheta(ability, weight, Lagrangian, v, lambda1)
        Lagrangian0 <- BTLagrangian(Lagrangian, ability, theta, v)
        k <- sum(abs(Lagrangian0 - Lagrangian))
        if (k < thersh) {
          stop <- 1
        } else {
          Lagrangian <- Lagrangian0
          v <- max(Lagrangian^2)
        }
        s0 <- penaltyAmount(ability, weight)
      }
      
      p0 <- s0/s1
      ability0 <- cbind(ability0, ability)
      l0 <- BTLikelihood(dataframe, ability, decay.rate = decay.rate)
      l <- c(l, l0)
      p <- c(p, p0)
      
      degree1 <- round(ability[1:n, 1], -log10(thersh)-1)
      degree <- length(unique(degree1))
      df <- c(df, degree)
      
      map <- function(x){
        if (x == 1){
          return(1)
        } else {
          match <- which(degree1[1:(x - 1)] == degree1[x])
          if (length(match) == 0) {
            return(length(unique(degree1[1:x])))
          } else {
            return(length(unique(degree1[1:match[1]])))
          }
        }
      }
      
      dataframe1 <- dataframe
      dataframe1[, 1] <- sapply(dataframe[, 1], map)
      dataframe1[, 2] <- sapply(dataframe[, 2], map)
      Hability1 <- matrix(0, nrow = (degree + 1), ncol = 1)
      HBT <- BTdecay(dataframe1, Hability1, decay.rate = decay.rate, fixed = map(fixed), iter = iter)
      Hability1 <- HBT$ability
      Hability <- ability
      for (i in 1:n) {
        Hability[i, 1] <- Hability1[map(i), 1]
      }
      Hability[(n + 1), 1] <- Hability1[(degree + 1), 1]
      Hability0 <- cbind(Hability0, Hability)
      hl0 <- BTLikelihood(dataframe, Hability, decay.rate = decay.rate)
      hl <- c(hl, hl0)
      
      slambda <- c(slambda, lambda1)
      
      if (degree > (degree0 + 1) && abs(lambda0 - lambda1) > (thersh * 10)) {
        lambda1 <- (lambda0 + lambda1)/2
      } else {
        lambda0 <- lambda1
        lambda1 <- lambda1 * exp(-k0)
        degree0 <- max(degree0, degree)
      }
    }
  }
  
  if (!is.null(lambda)){
    lambda <- sort(lambda, decreasing = TRUE)
    for (i in 1:length(lambda)) {
      stop <- 0
      while (stop==0) {
        ##ability <- BTdecayLasso.step1(dataframe, ability, weight, Lagrangian, theta, v, lambda[i], 
        ##                              decay.rate = decay.rate, fixed = fixed, thersh = thersh, iter = iter)
        ability <- BTdecay.Qua(dataframe, ability, theta, v, Lagrangian, decay.rate = decay.rate,
                               fixed = fixed, iter = iter)
        theta <- BTtheta(ability, weight, Lagrangian, v, lambda[i])
        Lagrangian0 <- BTLagrangian(Lagrangian, ability, theta, v)
        k <- sum(abs(Lagrangian0 - Lagrangian))
        if (k < thersh) {
          stop <- 1
        } else {
          Lagrangian <- Lagrangian0
          v <- max(Lagrangian^2)
        }
        s0 <- penaltyAmount(ability, weight)
      }
      
      p0 <- s0/s1
      ability0 <- cbind(ability0, ability)
      l0 <- BTLikelihood(dataframe, ability, decay.rate = decay.rate)
      l <- c(l, l0)
      p <- c(p, p0)
      
      degree1 <- round(ability[1:n, 1], -log10(thersh)-1)
      degree <- length(unique(degree1))
      df <- c(df, degree)
      
      map <- function(x){
        if (x == 1){
          return(1)
        } else {
          match <- which(degree1[1:(x - 1)] == degree1[x])
          if (length(match) == 0) {
            return(length(unique(degree1[1:x])))
          } else {
            return(length(unique(degree1[1:match[1]])))
          }
        }
      }
      
      dataframe1 <- dataframe
      dataframe1[, 1] <- sapply(dataframe[, 1], map)
      dataframe1[, 2] <- sapply(dataframe[, 2], map)
      Hability1 <- matrix(0, nrow = (degree + 1), ncol = 1)
      HBT <- BTdecay(dataframe1, Hability1, decay.rate = decay.rate, fixed = map(fixed), iter = iter)
      Hability1 <- HBT$ability
      Hability <- ability
      for (i in 1:n) {
        Hability[i, 1] <- Hability1[map(i), 1]
      }
      Hability[(n + 1), 1] <- Hability1[(degree + 1), 1]
      Hability0 <- cbind(Hability0, Hability)
      hl0 <- BTLikelihood(dataframe, Hability, decay.rate = decay.rate)
      hl <- c(hl, hl0)
      
      slambda <- c(slambda, lambda[i])
    }
  }
  
  
  ability0 <- cbind(ability0, ability1)
  Hability0 <- cbind(Hability0, ability1)
  l <- c(l, l1)
  hl <- c(hl, l1)
  p <- c(p, 1)
  df <- c(df, n)
  
  if (is.null(lambda)) {
    output <- list(ability.path = ability0, likelihood.path = l, penalty.path = p, df.path = df, Lambda.path = c(slambda, 0), path = path,
                   HYBRID.ability.path = Hability0, HYBRID.likelihood.path = hl, decay.rate = decay.rate)
    class(output) <- "wlasso"
    
  } else { 
    n3 <- length(lambda)
    n4 <- length(slambda)
    if (path == FALSE) {
      output <- list(ability = as.matrix(ability0[, 1:n4]), likelihood = l[1:n4], penalty = p[1:n4], df = df[1:n4], Lambda = slambda, path = path,
                     HYBRID.ability = as.matrix(Hability0[, 1:n4]), HYBRID.likelihood = hl[1:n4], decay.rate = decay.rate)
      class(output) <- "slasso"
    } else {
      output <- list(ability = as.matrix(ability0[, (n4 - n3 + 1):n4]), likelihood = l[(n4 - n3 + 1):n4], penalty = p[(n4 - n3 + 1):n4], df = df[(n4 - n3 + 1):n4], Lambda = lambda,
                     ability.path = ability0, likelihood.path = l, penalty.path = p, df.path = df, Lambda.path = c(slambda, 0), path = path,
                     HYBRID.ability = as.matrix(Hability0[, (n4 - n3 + 1):n4]), HYBRID.likelihood = hl[(n4 - n3 + 1):n4], 
                     HYBRID.ability.path = Hability0, HYBRID.likelihood.path = hl,
                     decay.rate = decay.rate)
      class(output) <- "swlasso"
    }
  }
  output
}
heilokchow/BTdecayLasso documentation built on Jan. 27, 2024, 2:33 p.m.