check_data: Run data check pipeline to seek for potential problems with...

View source: R/check_data.R

check_dataR Documentation

Run data check pipeline to seek for potential problems with the data

Description

Run data check pipeline to seek for potential problems with the data

Usage

check_data(
  data,
  y = NULL,
  time = NULL,
  status = NULL,
  type = "auto",
  verbose = TRUE,
  check_correlation = TRUE
)

Arguments

data

A data source, that is one of the major R formats: data.table, data.frame, matrix, and so on.

y

A string that indicates a target column name for regression or classification. Either y, or pair: time, status can be used. By default NULL.

time

A string that indicates a time column name for survival analysis task. Either y, or pair: time, status can be used. By default NULL.

status

A string that indicates a status column name for survival analysis task. Either y, or pair: time, status can be used. By default NULL.

type

A character, one of 'binary_clf'/'regression'/'survival'/'auto'/'multiclass' that sets the type of the task. If 'auto' (the default option) then the function will figure out 'type' based on the number of unique values in the 'y' variable, or the presence of time/status columns.

verbose

A logical value, if set to TRUE, provides all information about the process, if FALSE gives none.

check_correlation

A logical value, if set to TRUE, provides information about the correlations between numeric, and categorical pairs of variables. Available only when verbose is set to TRUE. Default value is TRUE.

Value

A list with two vectors: lines of the report (str) and the outliers (outliers).

Examples

check_data(lisbon, 'Price')

ModelOriented/forester documentation built on June 6, 2024, 7:29 a.m.