imputeSOM: The Self-Organizing Maps with Built-in Missing Data...

View source: R/imputeSOM.R

imputeSOMR Documentation

The Self-Organizing Maps with Built-in Missing Data Imputation.

Description

imputeSOM is an extension of the online algorithm of the 'kohonen' package where missing data are imputed during the algorithm. All missing values are first imputed with initial values such as the mean of the observed variables.

Usage

imputeSOM(
  data,
  grid = somgrid(),
  rlen = 100,
  alpha = c(0.05, 0.01),
  radius = quantile(nhbrdist, 2/3),
  maxNA.fraction = 1,
  keep.data = TRUE,
  dist.fcts = NULL,
  init
)

Arguments

data

a matrix or data.frame with continuous variables containing the observations to be mapped on the grid by the kohonen algorithm, even if there are incomplete.

grid

a grid for the codebook vectors: see somgrid.

rlen

the number of times the complete data set will be presented to the network.

alpha

learning rate, a vector of two numbers indicating the amount of change. Default is to decline linearly from 0.05 to 0.01 over rlen updates.

radius

the radius of the neighbourhood, either given as a single number or a vector (start, stop). If it is given as a single number the radius will change linearly from radius to zero; as soon as the neighbourhood gets smaller than one only the winning unit will be updated. Note that the default before version 3.0 was to run from radius to -radius. If nothing is supplied, the default is to start with a value that covers 2/3 of all unit-to-unit distances.

maxNA.fraction

the maximal fraction of values that may be NA to prevent the column to be removed.

keep.data

if TRUE, return original data and mapping information. If FALSE, only return the trained map (in essence the codebook vectors).

dist.fcts

distance function to be used for the data. Admissable values currently are "sumofsquares", "euclidean" and "manhattan. Default is to use "sumofsquares".

init

a matrix or data.frame corresponding to the initial values for the codebook vectors. It should have the same number of variables (columns) as the data. The number of rows corresponding to the number of units in the map.

Value

An object of class "missSOM" with components

data

Data matrix, only returned if keep.data == TRUE.

ximp

Imputed data matrix.

unit.classif

Winning units for data objects, only returned if keep.data == TRUE.

distances

Distances of objects to their corresponding winning unit, only returned if keep.data == TRUE.

grid

The grid, an object of class somgrid.

codes

A list of matrices containing codebook vectors.

alpha, radius

Input arguments presented to the function.

maxNA.fraction

The maximal fraction of values that may be NA to prevent the column to be removed.

dist.fcts

The distance function used for the data.

See Also

somgrid, plot.missSOM, map.missSOM

Examples

data(wines)

## Data with no missing values 
som.wines <- imputeSOM(scale(wines), grid = somgrid(5, 5, "hexagonal"))
summary(som.wines)
print(dim(som.wines$data))

## Data with missing values 
X <- scale(wines)
missing_obs <- sample(1:nrow(wines), 10, replace = FALSE)
X[missing_obs, 1:2] <- NaN
som.wines <- imputeSOM(X, grid = somgrid(5, 5, "hexagonal"))
summary(som.wines)
print(dim(som.wines$ximp))
print(sum(is.na(som.wines$ximp)))


missSOM documentation built on May 5, 2022, 9:06 a.m.

Related to imputeSOM in missSOM...