cknn: Find comparable properties using clustered nearest neighbors

Description Usage Arguments Details Value Note

View source: R/cknn.R

Description

Sales comparables are recent sales which have the same (or very similar) characteristics as a target (unsold) property. They are frequently used by assessors, real estate agents, and appraisers to determine the fair market value of a home. However, finding comparable properties at scale can be difficult.

This function can be used to quickly find comparables for any number of unsold properties. It can also be used more generally to find similar properties that are nearby each other, regardless of whether or not they sold.

See the documentation site for example usage.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
cknn(
  data,
  lon,
  lat,
  m = 5,
  k = 10,
  l = 0.5,
  var_weights = NULL,
  keep_data = TRUE,
  ...
)

Arguments

data

A data frame containing the variables to cluster on. Should contain both numerics and factors. Numerics should be unscaled. Lat/lon should NOT be included.

lon

A numeric vector of longitude values, reprojected into planar coordinates specific to the target area. See here for details on reprojection using R.

lat

A numeric vector of latitude values, reprojected into planar coordinates specific to the target area. See here for details on reprojection using R.

m

The number of clusters to create using the kproto function.

k

The number of nearest neighbors to return for each row of input data.

l

Hyperparameter representing the trade-off between distance and characteristics in kNN matching. Must be >= 0 and <= 1. Value equal to 1 will match on distance only, while value equal to 0 will disregard distance and match on characteristics only. Default 0.5 (equal weight).

var_weights

Value(s) passed to lambda input of kproto. See details.

keep_data

Logical for whether original data should be included in the returned object.

...

Arguments passed on to kproto, most commonly iter.max.

Details

The cknn algorithm works in two stages:

  1. Divide the full set of sales into m clusters according to each property's characteristics. This mimics the process of market segmentation or separating properties into different classes. This clustering is done using the k-prototypes function kproto from the clustMixType library. See the clustMixType whitepaper for more information.

  2. For each property i, find the k nearest neighbors within i's cluster, minimizing the distance over planar coordinates and Euclidean distance to all cluster centers, This is accomplished with the fast kNN function from kNN.

Options for inputs to var_weights include:

Value

Object of class cknn containing:

kproto

kproto object containing clusters, centroids, etc.

knn

List of k nearest neighbors for each row in the input data.

knn_idx

Lookup for translating in-cluster index positions to row indices from the input data. Used by predict method.

lon

Unaltered input longitude vector. Used by predict method for scaling new input data.

lat

Unaltered input latitude vector. Used by predict method for scaling new input data.

var_weights

Unaltered variable weights used to construct the cknn model.

m

Number of clusters created by kproto.

k

Number of nearest neighbors returned by kNN.

l

Hyperparameter used for distance/characteristics trade-off.

data

Unaltered input data frame. Used by predict method for scaling new input data. Only returned if keep_data is TRUE.

Note

Input data should be thoroughly cleaned. Outliers in numeric vectors and factors with rare levels can both affect clustering performance. Outlier values should be removed. Rare factor levels should be collapsed into a single level or removed.


OllieS8/bearing documentation built on Dec. 31, 2020, 3:23 p.m.