DEM: Distributed EM Imputation (DEM) for Handling Missing Data

View source: R/DEM.R

DEMR Documentation

Distributed EM Imputation (DEM) for Handling Missing Data

Description

This function performs DEM to handle missing data by dividing the dataset into D blocks, applying the EM imputation method within each block, and then combining the results. It calculates various evaluation metrics including RMSE, MMAE, RRE, and Consistency Proportion Index (CPP) using different hierarchical clustering methods.

Usage

DEM(data0, data.sample, data.copy, mr, km, D)

Arguments

data0

The original dataset containing the response variable and features.

data.sample

The dataset used for sampling, which may contain missing values.

data.copy

A copy of the original dataset, used for comparison or validation.

mr

Indices of the rows with missing values that need to be predicted.

km

The number of clusters for k-means clustering.

D

The number of blocks to divide the data into.

Value

A list containing:

XDEM

The imputed dataset.

RMSEDEM

The Root Mean Squared Error.

MAEDEM

The Mean Absolute Error.

REDEM

The Relative Eelative Error.

GCVDEM

The DEM Imputation for Generalized Cross-Validation.

timeDEM

The DEM algorithm execution time.

See Also

EM for the original EM function.

Examples

# Create a sample dataset with missing values
set.seed(123)
n <- 100
p <- 5
data.sample <- matrix(rnorm(n * p), nrow = n)
data.sample[sample(1:(n*p), 20)] <- NA
data.copy <- data.sample
data0 <- data.frame(data.sample, response = rnorm(n))
mr <- sample(1:n, 10)  # Sample rows for evaluation
km <- 3  # Number of clusters
D <- 2  # Number of blocks
# Perform DEM imputation
result <- DEM(data0, data.sample, data.copy, mr, km, D)
# Print the results
print(result$XDEM)


DTSR documentation built on April 3, 2025, 11:35 p.m.

Related to DEM in DTSR...