EM: Expectation-Maximization Imputation with Evaluation Metrics

View source: R/EM.R

EMR Documentation

Expectation-Maximization Imputation with Evaluation Metrics

Description

This function performs Expectation-Maximization (EM) imputation on a dataset with missing values. It uses the 'imputeEM' function from the 'mvdalab' package to estimate the missing values. The function also calculates various evaluation metrics including RMSE, MMAE, and RRE. Additionally, it performs k-means and hierarchical clustering to assess the quality of the imputation.

Usage

EM(data0, data.sample, data.copy, mr, km)

Arguments

data0

The original dataset containing the response variable and features.

data.sample

The dataset used for sampling, which may contain missing values.

data.copy

A copy of the original dataset, used for comparison or validation.

mr

Indices of the rows with missing values that need to be predicted.

km

The number of clusters for k-means clustering.

Value

A list containing:

Xnew

The imputed dataset.

RMSE

The Root Mean Squared Error.

MMAE

The Mean Absolute Error.

RRE

The Relative Eelative Error.

CPP1

The K-means clustering Consistency Proportion Index.

CPP2

The Hierarchical Clustering Complete Linkage Consistency Proportion Index.

CPP3

The Hierarchical Clustering Single Linkage Consistency Proportion Index.

CPP4

The Hierarchical Clustering Average Linkage Consistency Proportion Index.

CPP5

The Hierarchical Clustering Centroid linkage Consistency Proportion Index.

CPP6

The Hierarchical Clustering Median Linkage Consistency Proportion Index.

CPP7

The Hierarchical Clustering Ward's Method Consistency Proportion Index.

timeEM

The EM algorithm execution time.

Examples

# Create a sample matrix with random values and introduce missing values
set.seed(123)
n <- 100
p <- 5
data.sample <- matrix(rnorm(n * p), nrow = n)
data.sample[sample(1:(n*p), 20)] <- NA
data.copy <- data.sample
data0 <- data.frame(data.sample, response = rnorm(n))
mr <- sample(1:n, 10)  # Sample rows for evaluation
km <- 3  # Number of clusters
# Perform EM imputation
result <- EM(data0, data.sample, data.copy, mr, km)
# Print the results
print(result$RMSE)
print(result$MMAE)
print(result$RRE)
print(result$CPP1)
print(result$Xnew)

DTSR documentation built on April 3, 2025, 11:35 p.m.

Related to EM in DTSR...