mean: Mean Imputation with Evaluation Metrics
In DTSR: Distributed Trimmed Scores Regression for Handling Missing Data

View source: R/mean.R

mean	R Documentation

Mean Imputation with Evaluation Metrics

Description

This function performs mean imputation on a dataset with missing values. It replaces missing values with the column means and calculates various evaluation metrics including RMSE, MMAE, and RRE. Additionally, it performs k-means and hierarchical clustering to assess the quality of the imputation.

Usage

mean(data0, data.sample, data.copy, mr, km)

Arguments

`data0`	The original dataset containing the response variable and features.
`data.sample`	The dataset used for sampling, which may contain missing values.
`data.copy`	A copy of the original dataset, used for comparison or validation.
`mr`	Indices of the rows with missing values that need to be predicted.
`km`	The number of clusters for k-means clustering.

Value

A list containing:

`Xnew`	The imputed dataset.
`RMSE`	The Root Mean Squared Error.
`MMAE`	The Mean Absolute Error.
`RRE`	The Relative Eelative Error.
`CPP1`	The K-means clustering Consistency Proportion Index.
`CPP2`	The Hierarchical Clustering Complete Linkage Consistency Proportion Index.
`CPP3`	The Hierarchical Clustering Single Linkage Consistency Proportion Index.
`CPP4`	The Hierarchical Clustering Average Linkage Consistency Proportion Index.
`CPP5`	The Hierarchical Clustering Centroid linkage Consistency Proportion Index.
`CPP6`	The Hierarchical Clustering Median Linkage Consistency Proportion Index.
`CPP7`	The Hierarchical Clustering Ward's Method Consistency Proportion Index.
`timemean`	The mean algorithm execution time.

Examples

# Create a sample matrix with random values and introduce missing values
set.seed(123)
n <- 100
p <- 5
data.sample <- matrix(rnorm(n * p), nrow = n)
data.sample[sample(1:(n*p), 20)] <- NA
data.copy <- data.sample
data0 <- data.frame(data.sample, response = rnorm(n))
mr <- sample(1:n, 10)  # Sample rows for evaluation
km <- 3  # Number of clusters
# Perform mean imputation
result <- mean(data0, data.sample, data.copy, mr, km)
# Print the results
print(result$RMSE)
print(result$MMAE)
print(result$RRE)
print(result$CPP1)
print(result$Xnew)

DTSR documentation built on June 8, 2025, 1:33 p.m.