Task: Create a classification, regression, survival, cluster,...

Description Usage Arguments Details Value See Also Examples

Description

The task encapsulates the data and specifies - through its subclasses - the type of the task. It also contains a description object detailing further aspects of the data.

Useful operators are: getTaskFormula, getTaskFeatureNames, getTaskData, getTaskTargets, and subsetTask.

Object members:

env [environment]

Environment where data for the task are stored. Use getTaskData in order to access it.

weights [numeric]

See argument. NULL if not present.

blocking [factor]

See argument. NULL if not present.

task.desc [TaskDesc]

Encapsulates further information about the task.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
makeClassifTask(id = deparse(substitute(data)), data, target,
  weights = NULL, blocking = NULL, spatial = FALSE,
  positive = NA_character_, fixup.data = "warn", check.data = TRUE)

makeClusterTask(id = deparse(substitute(data)), data, weights = NULL,
  blocking = NULL, spatial = FALSE, fixup.data = "warn",
  check.data = TRUE)

makeCostSensTask(id = deparse(substitute(data)), data, costs,
  blocking = NULL, spatial = FALSE, fixup.data = "warn",
  check.data = TRUE)

makeMultilabelTask(id = deparse(substitute(data)), data, target,
  weights = NULL, blocking = NULL, spatial = FALSE, fixup.data = "warn",
  check.data = TRUE)

makeRegrTask(id = deparse(substitute(data)), data, target, weights = NULL,
  blocking = NULL, spatial = FALSE, fixup.data = "warn",
  check.data = TRUE)

makeSurvTask(id = deparse(substitute(data)), data, target, weights = NULL,
  blocking = NULL, spatial = FALSE, fixup.data = "warn",
  check.data = TRUE)

Arguments

id

[character(1)]
Id string for object. Default is the name of the R variable passed to data.

data

[data.frame]
A data frame containing the features and target variable(s).

target

[character(1) | character(2) | character(n.classes)]
Name(s) of the target variable(s). For survival analysis these are the names of the survival time and event columns, so it has length 2. For multilabel classification it contains the names of the logical columns that encode whether a label is present or not and its length corresponds to the number of classes.

weights

[numeric]
Optional, non-negative case weight vector to be used during fitting. Cannot be set for cost-sensitive learning. Default is NULL which means no (= equal) weights.

blocking

[factor]
An optional factor of the same length as the number of observations. Observations with the same blocking level “belong together”. Specifically, they are either put all in the training or the test set during a resampling iteration. Default is NULL which means no blocking.

spatial

[logical(1)]
Does the task contain a spatial reference (coordinates) which should be used for spatial partioning of the data? See details.

positive

[character(1)]
Positive class for binary classification (otherwise ignored and set to NA). Default is the first factor level of the target attribute.

fixup.data

[character(1)]
Should some basic cleaning up of data be performed? Currently this means removing empty factor levels for the columns. Possible choices are: “no” = Don't do it. “warn” = Do it but warn about it. “quiet” = Do it but keep silent. Default is “warn”.

check.data

[logical(1)]
Should sanity of data be checked initially at task creation? You should have good reasons to turn this off (one might be speed). Default is TRUE.

costs

[data.frame]
A numeric matrix or data frame containing the costs of misclassification. We assume the general case of observation specific costs. This means we have n rows, corresponding to the observations, in the same order as data. The columns correspond to classes and their names are the class labels (if unnamed we use y1 to yk as labels). Each entry (i,j) of the matrix specifies the cost of predicting class j for observation i.

Details

For multilabel classification we assume that the presence of labels is encoded via logical columns in data. The name of the column specifies the name of the label. target is then a char vector that points to these columns.

If spatial = TRUE and 'SpCV' or 'SpRepCV' are selected as resampling method, variables named x and y will be used for spatial partitioning of the data (kmeans clustering). They will not be used as predictors during modeling. Be aware: If coordinates are not named x and y they will be treated as normal predictors!

Functional data can be added to a task via matrix columns. For more information refer to makeFunctionalData.

Value

[Task].

See Also

Other costsens: makeCostSensClassifWrapper, makeCostSensRegrWrapper, makeCostSensWeightedPairsWrapper

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
if (requireNamespace("mlbench")) {
  library(mlbench)
  data(BostonHousing)
  data(Ionosphere)

  makeClassifTask(data = iris, target = "Species")
  makeRegrTask(data = BostonHousing, target = "medv")
  # an example of a classification task with more than those standard arguments:
  blocking = factor(c(rep(1, 51), rep(2, 300)))
  makeClassifTask(id = "myIonosphere", data = Ionosphere, target = "Class",
    positive = "good", blocking = blocking)
  makeClusterTask(data = iris[, -5L])
}

riebetob/mlr documentation built on May 20, 2019, 5:58 p.m.