ojaSCM: Oja Sign Convariance Matrix
In OjaNP: Multivariate Methods Based on the Oja Median and Related Concepts

Description Usage Arguments Details Value Author(s) References See Also Examples

The function computes the Oja sign covariance matrix of a data set X.

1 2	ojaSCM(X, center = "ojaMedian", p = NULL, silent = FALSE, na.action = na.fail, ...)

X

numeric data.frame or matrix containing the data points as rows.

center

one of the following three:

a numeric vector giving the location of the data,
a function that computes a multivariate location (see details below) or
one of the following strings:
- "colMean" (vector of means, function colMeans is called),
- "ojaMedian" (function ojaMedian),
- "spatialMedian" (function spatial.median from package
  ICSNP),
- "compMedian" (marginal median) or
- "HRMedian" (Hettmansperger and Randles median, function
  HR.Mest from package ICSNP).

The default is "ojaMedian".

`p`	`NULL` or a number between 0 and 1 which specifies the fraction of hyperplanes to be used for subsampling. If `p = 1`, no subsampling is done. If `p = NULL`, the value of `p` is determined based on the size of the data set. See function `ojaSign` for details.
`silent`	logical, if subsampling is done or the expected computation time is too long, a warning message will be printed unless `silent` is `TRUE`. The default is `FALSE`.
`na.action`	a function which indicates what should happen when the data contain 'NA's. Default is to fail.
`...`	arguments passed on to the location function.

The function computes the Oja sign covariance matrix of the data set X, that is (if the Oja signs are centered by the Oja median) the covariance matrix of the Oja signs of the data points in X, taken w.r.t. X.

For a definition of the Oja sign covariance matrix and its properties see references below. The matrix X needs to have at least two columns and at least as many rows as columns in order to give sensible results. The return value is a quadratic, symmetric matrix having as many columns as X.

Oja signs (contrary to Oja ranks) require the computation of a centre (location) of the data cloud. The function offers various ways to specify the location. For details on location computation see function ojaSign.

The function offers a subsampling option in order to speed up computation for large data sets. The subsampling fraction is controlled by the parameter p. If p is not specified (which defaults to p = NULL), it is automatically determined based on the dimension of the problem. The function tries to realize a reasonable compromise between accuracy and computing time, that is, for sufficiently small data matrices X the sampling fraction p is set to 1. Subsampling is applied to hyperplanes, not data points. A sample is drawn once, all Oja signs are then computed based on this sample. For further details on subsampling see function ojaSign. Subsampling is useful. Even for very small p useable results can be expected, see e.g. Example 2.

a symmetric matrix with ncol(X) columns and rows.

Daniel Vogel

Fischer D, Mosler K, Möttönen J, Nordhausen K, Pokotylo O and Vogel D (2020). “Computing the Oja Median in R: The Package OjaNP.” Journal of Statistical Software, 92(8), pp. 1-36. doi: 10.18637/jss.v092.i08 (URL: http://doi.org/10.18637/jss.v092.i08).

Visuri, S., Koivunen, V., Oja, H. (1999), Sign and rank covariance matrices, J. Stat. Plann. Inference, 91, 557–575.

Ollila, E., Oja, H., Croux, C. (2003), The affine equivariant sign covariance matrix: Asymptotic behavior and efficiencies, J. Multivariate Analysis, 87, 328–355.

ojaSign, ojaRCM, ojaMedian, spatial.median, HR.Mest

### ----<< Example 1 >>---- : biochem data
data(biochem)
X <- biochem[,1:2]
ojaSCM(X)

# Oja signs are correctly centered 
# (i.e. they add up to zero) when 
# computed w.r.t. the Oja median
# Hence the following return the same,
ojaSCM(X, center = "ojaMedian", alg = "exact")
(1 - 1/nrow(X))*cov(ojaSign(X, alg = "exact"))
# but the following not.
ojaSCM(X, center = "colMean")
(1 - 1/nrow(X))*cov(ojaSign(X, center = "colMean"))



### ----<< Example 2 >>---- : 300 points in R^7 
# The merit of subsampling.
# The following example might take a bit longer:
## Not run: 
A <- matrix(c(1,0.5,1,4,2,0.5,-0.5,1,4), ncol = 3)
B <- A %x% A;  Sigma  <- (B %*% t(B))[1:7, 1:7]
# Sigma is some arbitrary positive definite matrix.
set.seed(123)
X <- rmvnorm(n=300,sigma=Sigma) 

cov2cor(Sigma) # the true correlation matrix
cor(X)  # Bravais-Pearson correlation
cov2cor(solve(ojaSCM(X, center = "colMean")))
# correlation estimate based on Oja signs 
# The subsampling fraction in this example
# is p = 4.542038e-09.
# Yet it returns a sensible estimate.

## End(Not run)