R/forest_fires.R

#' Forest Fires
#'
#' This is a difficult regression task, where the aim is to predict the burned
#' area of forest fires, in the northeast region of Portugal, by using
#' meteorological and other data.
#'
#' @format A data frame with 517 observations on the following 13 variables.
#' \enumerate{
#'   \item x: x-axis spatial coordinate within the Montesinho park map: 1 to 9
#'   \item y: y-axis spatial coordinate within the Montesinho park map: 2 to 9
#'   \item month: month of the year: "jan" to "dec"
#'   \item day: day of the week: "mon" to "sun"
#'   \item ffmc: FFMC index from the FWI system: 18.7 to 96.20
#'   \item dmc: DMC index from the FWI system: 1.1 to 291.3
#'   \item dc: DC index from the FWI system: 7.9 to 860.6
#'   \item isi: ISI index from the FWI system: 0.0 to 56.10
#'   \item temp: temperature in Celsius degrees: 2.2 to 33.30
#'   \item rh: relative humidity in %: 15.0 to 100
#'   \item wind: wind speed in km/h: 0.40 to 9.40
#'   \item rain: outside rain in mm/m2 : 0.0 to 6.4
#'   \item area: the burned area of the forest (in ha): 0.00 to 1090.84 (this
#'   output variable is very skewed towards 0.0, thus it may make sense to
#'   model with the logarithm transform).
#' }
#'
#' @details
#' This is a very difficult regression task. It can be used to test regression
#' methods. Also, it could be used to test outlier detection methods, since it
#' is not clear how many outliers are there. Yet, the number of examples of
#' fires with a large burned area is very small.
#'
#' Note: several of the attributes may be correlated, thus it makes sense to
#' apply some sort of feature selection.
#'
#' Past usage:
#' P. Cortez and A. Morais. A Data Mining Approach to Predict Forest Fires using Meteorological Data.
#' In Proceedings of the 13th EPIA 2007 - Portuguese Conference on Artificial Intelligence,
#' December, 2007. (http://www.dsi.uminho.pt/~pcortez/fires.pdf)
#'
#' In the above reference, the output "area" was first transformed with a ln(x+1) function.
#' Then, several Data Mining methods were applied. After fitting the models, the outputs were
#' post-processed with the inverse of the ln(x+1) transform. Four different input setups were
#' used. The experiments were conducted using a 10-fold (cross-validation) x 30 runs. Two
#' regression metrics were measured: MAD and RMSE. A Gaussian support vector machine (SVM) fed
#' with only 4 direct weather conditions (temp, RH, wind and rain) obtained the best MAD value:
#' 12.71 +- 0.01 (mean and confidence interval within 95% using a t-student distribution). The
#' best RMSE was attained by the naive mean predictor. An analysis to the regression error curve
#' (REC) shows that the SVM model predicts more examples within a lower admitted error. In effect,
#' the SVM model predicts better small fires, which are the majority.
#'
#' @references
#' P. Cortez and A. Morais. A Data Mining Approach to Predict Forest Fires using Meteorological Data.
#' In J. Neves, M. F. Santos and J. Machado Eds., New Trends in Artificial Intelligence,
#' Proceedings of the 13th EPIA 2007 - Portuguese Conference on Artificial Intelligence, December,
#' Guimaraes, Portugal, pp. 512-523, 2007. APPIA, ISBN-13 978-989-95618-0-9.
#' Available at: http://www.dsi.uminho.pt/~pcortez/fires.pdf
#'
#' https://archive.ics.uci.edu/ml/machine-learning-databases/forest-fires/
#'
#' https://archive.ics.uci.edu/ml/datasets/Forest+Fires
#'
#' @source
#' Created by: Paulo Cortez and Anibal Morais (Univ. Minho) @ 2007
"forest_fires"
tyluRp/ucimlr documentation built on Feb. 2, 2021, 6:51 a.m.