nroMatch: Best-matching districts
In Numero: Statistical Framework to Define Subgroups in Complex Datasets

nroMatch

R Documentation

Best-matching districts

Description

Compare multi-dimensional data points against the district profiles of a self-organizing map (SOM).

Usage

nroMatch(centroids, data)

Arguments

`centroids`	Either a matrix, a data frame or a list that contains the element `centroids`.
`data`	A data matrix with identical column names to the centroid matrix.

Details

The input argument centroids can be a matrix or a data frame that contains multivariable data profiles organized row-wise. It can also be the output list object from nroKmeans() or nroTrain().

Value

A vector of integers with elements corresponding to the rows in data. Each element contains the index of the best matching row from centroids.

The vector also has the attribute 'quality' that contains three columns: RESIDUAL is the distance between a point and a centroid in data space (shorter is better), RESIDUAL.z is a scale-independent version of RESIDUAL if the mean residual and standard deviation are available from training history, and COVERAGE shows the proportion of data elements that were available for matching.

The names of the columns that were used for matching are stored in the attribute variables.

Examples

# Import data.
fname <- system.file("extdata", "finndiane.txt", package = "Numero")
dataset <- read.delim(file = fname)

# Prepare training data.
trvars <- c("CHOL", "HDL2C", "TG", "CREAT", "uALB")
trdata <- scale.default(dataset[,trvars]) 

# K-means clustering.
km <- nroKmeans(data = trdata, k = 10)

# Assign data points into districts.
matches <- nroMatch(centroids = km, data = trdata)
print(head(attr(matches,"quality")))
print(table(matches))

Numero documentation built on Sept. 17, 2024, 5:09 p.m.