Gmedian: Gmedian

GmedianR Documentation

Gmedian

Description

Computes recursively the Geometric median (also named spatial median or L1-median) with a fast averaged stochastic gradient algorithms that can deal rapidly with large samples of high dimensional data.

Usage

Gmedian(X, init = NULL, gamma = 2, alpha = 0.75, nstart=2, epsilon=1e-08) 

Arguments

X

Data matrix, with n (rows) observations in dimension d (columns).

init

When NULL the starting point of the algorithm is the first observation. Else the starting point of the algorithm is provided by init.

gamma

Value (positive) of the constant controling the descent steps (see details).

alpha

Rate of decrease of the descent steps (see details). Should satisfy 1/2< alpha <= 1.

nstart

Number of times the algorithm is ran over all the data set.

epsilon

Numerical tolerance. By defaut set to 1e-08.

Details

The recursive averaged algorithm is described in Cardot, Cenac, Zitt (2013), with descent steps defined as α_n = gamma/n^{alpha}.

Value

Vector of the geometric median.

References

Cardot, H., Cenac, P. and Zitt, P-A. (2013). Efficient and fast estimation of the geometric median in Hilbert spaces with an averaged stochastic gradient algorithm. Bernoulli, 19, 18-43.

See Also

See also GmedianCov, kGmedian and Weiszfeld.

Examples

## Simulated data - Brownian paths
n <- 1e2
d <- 100
x <- matrix(rnorm(n*d,sd=1/sqrt(d)), n, d)
x <- t(apply(x,1,cumsum))

## Computation speed
system.time(replicate(10, {
  median.est = Gmedian(x)})) 
system.time(replicate(10, {
  mean.est = apply(x,2,mean)}))
##

## Accuracy with contaminated data
n <- 1e03
d <- 10
n.contaminated <- 0.05*n ## 5% of contaminated observations
n.experiment <- 100
err.L2 <- matrix(NA,ncol=3,nrow=n.experiment)
colnames(err.L2) = c("mean (no contam.)", "mean (contam.)","Gmedian")

for (n.sim in 1:n.experiment){
x <- matrix(rnorm(n*d,sd=1/sqrt(d)), n, d)
x <- t(apply(x,1,cumsum))
err.L2[n.sim,1] <- sum((apply(x,2,mean))^2/d)
ind.contaminated <- sample(1:n,n.contaminated) ## contam. units
x[ind.contaminated,] <- 5 
err.L2[n.sim,2] <- sum((apply(x,2,mean))^2/d)
err.L2[n.sim,3] <- sum(Gmedian(x)^2/d)
}
boxplot(err.L2,main="L2 error")

Gmedian documentation built on June 8, 2022, 5:07 p.m.