DRPCA: Distributed Robust Principal Component Analysis (DRPCA) for...
In DTSR: Distributed Trimmed Scores Regression for Handling Missing Data

DRPCA

R Documentation

Distributed Robust Principal Component Analysis (DRPCA) for Handling Missing Data

Description

This function performs DRPCA to handle missing data by dividing the dataset into D blocks, applying the Robust Principal Component Analysis (RPCA) method to each block, and then combining the results. It calculates various evaluation metrics including RMSE, MMAE, RRE, and Generalized Cross-Validation (GCV) using different hierarchical clustering methods.

Usage

DRPCA(data0, data.sample, data.copy, mr, km, D)

Arguments

`data0`	The original dataset containing the response variable and features.
`data.sample`	The dataset used for sampling, which may contain missing values.
`data.copy`	A copy of the original dataset, used for comparison or validation.
`mr`	Indices of the rows with missing values that need to be predicted.
`km`	The number of clusters for k-means clustering.
`D`	The number of blocks to divide the data into.

Value

A list containing:

`XDRPCA`	The imputed dataset.
`RMSEDRPCA`	The Root Mean Squared Error.
`MAEDRPCA`	The Mean Absolute Error.
`REDRPCA`	The Relative Eelative Error.
`GCVDRPCA`	Distributed DRPCA Imputation for Generalized Cross-Validation.
`timeDRPCA`	The DRPCA algorithm execution time.

Examples

# Create a sample dataset with missing values
set.seed(123)
n <- 100
p <- 10
D <- 2
data.sample <- matrix(rnorm(n * p), nrow = n)
data.sample[sample(1:(n-10), (p-2))] <- NA
data.copy <- data.sample
data0 <- data.frame(data.sample, response = rnorm(n))
mr <- sample(1:n, 10)  # Sample rows for evaluation
km <- 3  # Number of clusters
result <- DRPCA(data0, data.sample, data.copy, mr, km, D)
#Print the results
print(result$XDRPCA)

DTSR documentation built on June 8, 2025, 1:33 p.m.