logLoss: Logarithmic loss (logLoss)
In rfUtilities: Random Forests Model Selection and Performance Evaluation

Description Usage Arguments Value Note Author(s) References Examples

Evaluation of estimate quality in binomial models using cross-entropy or log likelihood loss

1 2	logLoss(y, p, likelihood = FALSE, global = TRUE, eps = 0.000000000000001)

`y`	vector of observed binomial values 0,1
`p`	vector of predicted probabilities 0-1
`likelihood`	(FALSE/TRUE) return log likelihood loss, default is (FALSE) for log loss
`global`	(TRUE/FALSE) return local or global log loss values, if FALSE local values are returned
`eps`	epsilon scaling factor to avoid NaN values

If likelihood TRUE the log likelihood loss will be returned. If global FALSE, a list with observed (y), probability (p) and log loss (log.loss) otherwise, a vector of global log loss value

The log loss metric, based on cross-entropy, measures the quality of predictions rather than the accuracy. Effectively, the log loss is a measure that gages additional error coming the estimates as opposed to the true values.

As the estimated probability diverges from its observed value the log loss increases with an expected of [0-1] where 0 would be a perfect model. For a single sample with true value yt in 0,1 and estimated probability yp that yt = 1, the log loss is derived as: -log P(yt | yp) = -(yt log(yp) + (1 - yt) log(1 - yp)) eps is used where log loss is undefined for p=0 or p=1, so probabilities are clipped to: max(eps, min(1 - eps, p)) If likelihood is output, the eps and local arguments are ignored.

Jeffrey S. Evans <jeffrey_evans<at>tnc.org>

C.M. Bishop (2006). Pattern Recognition and Machine Learning. Springer, p. 209.

  require(randomForest)
    data(iris)
    iris$Species <- ifelse( iris$Species == "versicolor", 1, 0 ) 
    # Add some noise
      idx1 <- which(iris$Species %in% 1)
      idx0 <- which( iris$Species %in% 0)
      iris$Species[sample(idx1, 2)] <- 0
      iris$Species[sample(idx0, 2)] <- 1
    
 ( mdl <- randomForest(x=iris[,1:4], y=as.factor(iris[,"Species"])) )
	
  # Global log loss	
    logLoss(y = iris$Species, p = predict(mdl, iris[,1:4], type="prob")[,2]) 
			   
  # Local log loss
    ( ll <- logLoss(y = iris$Species, p = predict(mdl, iris[,1:4], 
                   type="prob")[,2], global = FALSE) )

  # Log likelihood loss
    logLoss(y = iris$Species, p = predict(mdl, iris[,1:4], 
			    type="prob")[,2], likelihood = TRUE)