knn: Classification, regression, and clustering with k nearest...

Description Usage Arguments Details Value See Also Examples

View source: R/knn.R

Description

Classification, regression, and clustering with k nearest neighbors.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
knn(
  train_set,
  test_set,
  k = 3,
  categorical_target = NULL,
  continuous_target = NULL,
  comparison_measure,
  categorical_scoring_method = "majority_vote",
  continuous_scoring_method = "average",
  return_ranked_neighbors = 0,
  id = NULL
)

Arguments

train_set

Data frame containing the training instances, with features and any targets and IDs.

test_set

Data frame containing the test instances, with feature columns only.

k

Number of nearest neighbors.

categorical_target

Categorical target variable.

continuous_target

Continuous target variable.

comparison_measure

Distance or similarity measure.

categorical_scoring_method

Categorical scoring method.

continuous_scoring_method

Continuous scoring method.

return_ranked_neighbors

Number of ranked neighbors to return. A 0 indicates no ranked neighbors. Must not exceed k.

id

Column containing unique identifiers for each row in the training set. Only used when return_ranked_neighbors > 0.

Details

The algorithm can score data with continuous or logical features.

The algorithm can predict either a continuous or categorical target, or both (but no more than one of each), as well as return the closest neighbors ranked by distance or similarity. If no continuous or categorical target is provided, return_ranked_neighbors must be non-zero, and ranked neighbors will be returned.

There is no predict method for knn. The scored test set is returned as part of the neighbr object. The data to be scored must be passed in with the training data to knn().

Supported distance measures (used with continuous features): euclidean, squared_euclidean.

Supported similarity measures (used with logical features): simple_matching, jaccard, tanimoto.

Currently, only one type of categorical_scoring_method and continuous_scoring_method are supported (majority vote and average, respectively).

Logical features must consist of 0,1 or TRUE,FALSE values.

Categorical non-logical features must be transformed before being used.

The categorical target does not have to be of factor class, but is assumed to be not continuous.

The distance and similarity measures in this package are based on those defined in the PMML specification.

Several of the elements in the returned list are only used when converting the knn model to PMML (for example, function_name).

For more details and examples, see the vignette by running the following:

vignette("neighbr-help")

Value

An object of class neighbr, which is a list of the following:

call

The original call to knn.

k

Number of nearest neighbors.

categorical_target

Categorical target variable.

continuous_target

Continuous target variable.

comparison_measure

Distance or similarity measure.

categorical_scoring_method

Categorical scoring method.

continuous_scoring_method

Continuous scoring method.

return_ranked_neighbors

Number of ranked neighbors to return.

id

ID variable.

features

List of feature names.

function_name

Function name, used when generating PMML. One of "classification", "regression", "clustering", or "mixed".

categorical_levels

Levels of the categorical target.

num_train_rows

Number of training instances.

num_test_rows

Number of test instances.

train_set

Data frame with training instances.

test_set_scores

Data frame with scores for the test set.

See Also

similarity, distance, PMML KNN specification

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
# continuous features with continuous target, categorical target,
# and neighbor ranking

data(iris)

# add an ID column to the data for neighbor ranking
iris$ID <- c(1:150)

# train set contains all predicted variables, features, and ID column
train_set <- iris[1:145,]

# omit predicted variables or ID column from test set
test_set <- iris[146:150,-c(4,5,6)]

fit <- knn(train_set=train_set,test_set=test_set,
           k=5,
           categorical_target="Species",
           continuous_target= "Petal.Width",
           comparison_measure="euclidean",
           return_ranked_neighbors=3,
           id="ID")

neighbr documentation built on April 14, 2020, 7:37 p.m.