check_data: Run data check pipeline to seek for potential problems with...
In ModelOriented/forester: Quick and Simple Tools for Training and Testing of Tree-Based Models

check_data

R Documentation

Run data check pipeline to seek for potential problems with the data

Run data check pipeline to seek for potential problems with the data

check_data(
  data,
  y = NULL,
  time = NULL,
  status = NULL,
  type = "auto",
  verbose = TRUE,
  check_correlation = TRUE
)

`data`	A data source, that is one of the major R formats: data.table, data.frame, matrix, and so on.
`y`	A string that indicates a target column name for regression or classification. Either y, or pair: time, status can be used. By default NULL.
`time`	A string that indicates a time column name for survival analysis task. Either y, or pair: time, status can be used. By default NULL.
`status`	A string that indicates a status column name for survival analysis task. Either y, or pair: time, status can be used. By default NULL.
`type`	A character, one of 'binary_clf'/'regression'/'survival'/'auto'/'multiclass' that sets the type of the task. If 'auto' (the default option) then the function will figure out 'type' based on the number of unique values in the 'y' variable, or the presence of time/status columns.
`verbose`	A logical value, if set to TRUE, provides all information about the process, if FALSE gives none.
`check_correlation`	A logical value, if set to TRUE, provides information about the correlations between numeric, and categorical pairs of variables. Available only when verbose is set to TRUE. Default value is TRUE.

A list with two vectors: lines of the report (str) and the outliers (outliers).

check_data(lisbon, 'Price')

ModelOriented/forester documentation built on June 6, 2024, 7:29 a.m.

ModelOriented/forester index

Note that we can't provide technical support on individual packages. You should contact the package authors for that.