gknn: Generalized k-Nearest Neighbors Classification or Regression
In e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien

View source: R/gknn.R

gknn	R Documentation

Generalized k-Nearest Neighbors Classification or Regression

Description

gknn is an implementation of the k-nearest neighbours algorithm making use of general distance measures. A formula interface is provided.

Usage

## S3 method for class 'formula'
gknn(formula, data = NULL, ..., subset, na.action = na.pass, scale = TRUE)
## Default S3 method:
gknn(x, y, k = 1, method = NULL, 
                       scale = TRUE, use_all = TRUE, 
                       FUN = mean, ...)
## S3 method for class 'gknn'
predict(object, newdata, 
                         type = c("class", "votes", "prob"), 
                         ...,
                         na.action = na.pass)

Arguments

`formula`	a symbolic description of the model to be fit.
`data`	an optional data frame containing the variables in the model. By default the variables are taken from the environment which ‘gknn’ is called from.
`x`	a data matrix.
`y`	a response vector with one label for each row/component of `x`. Can be either a factor (for classification tasks) or a numeric vector (for regression).
`k`	number of neighbours considered.
`scale`	a logical vector indicating the variables to be scaled. If `scale` is of length 1, the value is recycled as many times as needed. By default, numeric matrices are scaled to zero mean and unit variance. The center and scale values are returned and used for later predictions. Note that the default metric for data frames is the Gower metric which standardizes the values to the unit interval.
`method`	Argument passed to `dist()` from the `proxy` package to select the distance metric used: a function, or a mnemonic string referencing the distance measure. Defaults to `"Euclidean"` for metric matrices, to `"Jaccard"` for logical matrices and to `"Gower"` for data frames.
`use_all`	controls handling of ties. If true, all distances equal to the kth largest are included. If false, a random selection of distances equal to the kth is chosen to use exactly k neighbours.
`FUN`	function used to aggregate the k nearest target values in case of regression.
`object`	object of class `gknn`.
`newdata`	matrix or data frame with new instances.
`type`	character specifying the return type in case of class predictions: for `"class"`, the class labels; for `"prob"`, the class distribution for all k neighbours considered; for `"votes"`, the raw counts.
`...`	additional parameters passed to `dist()`
`subset`	An index vector specifying the cases to be used in the training sample. (NOTE: If given, this argument must be named.)
`na.action`	A function to specify the action to be taken if `NA`s are found. The default action is `na.pass`. (NOTE: If given, this argument must be named.)

Value

For gknn(), an object of class "gknn" containing the data and the specified parameters. For predict.gknn(), a vector of predictions, or a matrix with votes for all classes. In case of an overall class tie, the predicted class is chosen by random.

Author(s)

David Meyer (David.Meyer@R-project.org)

Examples

data(iris)

model <- gknn(Species ~ ., data = iris)
predict(model, iris[c(1, 51, 101),])

test = c(45:50, 95:100, 145:150)

model <- gknn(Species ~ ., data = iris[-test,], k = 3, method = "Manhattan")
predict(model, iris[test,], type = "votes")

model <- gknn(Species ~ ., data = iris[-test], k = 3, method = "Manhattan")
predict(model, iris[test,], type = "prob")

e1071 documentation built on Sept. 17, 2024, 1:06 a.m.