ojaRank: Oja Ranks - Affine Equivariant Multivariate Ranks

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

The function computes the Oja rank of a point x w.r.t. a data set X or, if no point x is given, the Oja ranks of all points in X.

Usage

1
ojaRank(X, x = NULL, p = NULL, silent = FALSE, na.action = na.fail)

Arguments

X

numeric data.frame or matrix containing the data points as rows.

x

NULL or a numeric vector, the point for which the Oja rank should be computed.

p

NULL or a number between 0 and 1 which specifies the fraction of hyperplanes to be used for subsampling. If p = 1, no subsampling is done. If p = NULL, the value of p is determined based on the size of the data set. See details.

silent

logical, if subsampling is done or the expected computation time is too long, a warning message will be printed unless silent is TRUE. The default is FALSE.

na.action

a function which indicates what should happen when the data contain 'NA's. Default is to fail.

Details

The function computes the Oja rank of the point x w.r.t.\ the data set X or, if no x is specified, the Oja ranks of all data points in X w.r.t. X. For a definition of Oja rank see reference below.

The matrix X needs to have at least as many rows as columns in order to give sensible results. The vector x has to be of length ncol(X). If x is specified, a vector of length ncol(X) is returned. Otherwise the return value is a matrix of the same dimensions as X where the i-th row contains the Oja rank of the i-th row of X.

The function will also work for matrices X with only one column and also vectors. Then centered and normalized (univariate) ranks are returned.

For n = nrow(X) data points in R^k, where k = ncol(X), the computation of the Oja rank necessitates the evaluation of N = choose(n,k) hyperplanes in R^k. Thus for large data sets the function offers a subsampling option in order to deliver (approximate) results within reasonable time. The subsampling fraction is controlled by the parameter p: If p < 1 is passed to the function, the computation is based on a random sample of only pN of all possible N hyperplanes. If p is not specified (which defaults to p = NULL), it is automatically determined based on n and k to yield a sensible trade-off between accuracy and computing time. If choose(n,k)*k^3 < 6e+06, the sample fraction p is set to 1 (no subsampling). Otherwise p is choosen such that the computation (of one rank) usually takes around 20 seconds (on a 1.66 GHz CPU and 1 GB RAM). If all Oja ranks of X are requested, a hyperplane sample is drawn once, all Oja ranks are then computed based on this sample.

Finally, subsampling is feasible. Even for very small p useable results can be expected, see e.g. the examples for the function ojaRCM.

Claudia Köllmann is acknowledged for bug-fixing this function.

Value

either a numeric vector, the Oja rank of x, or a matrix of the same dimensions as X containing the Oja ranks of X as rows.

Author(s)

Daniel Vogel, Jyrki Möttönen

References

Fischer D, Mosler K, Möttönen J, Nordhausen K, Pokotylo O and Vogel D (2020). “Computing the Oja Median in R: The Package OjaNP.” Journal of Statistical Software, 92(8), pp. 1-36. doi: 10.18637/jss.v092.i08 (URL: http://doi.org/10.18637/jss.v092.i08).

Oja, H. (1999), Affine invariant multivariate sign and rank tests and corresponding estimates: A review, Scand. J. Statist., 26, 319–343.

See Also

ojaSign, ojaRCM, hyperplane, ojaSignedRank

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
### ----<< Example 1 >>---- : 30 points in R^2
## Not run: 
set.seed(123)
X <- rmvnorm(n = 30,mean = c(0,0)) # from package 'mvtnorm'
ojaRank(X)
ojaRank(X, x = c(100,100))
ojaRank(X, x = ojaMedian(X, alg="exact")) # close to zero

# The following two return the same (only in different time)
ojaRank(X)
t(apply(X, 1, function(y){ojaRank(X,y)}))
# but the following two do not (due to different subsampling).
# 1)
set.seed(123); ojaRank(X, p = 0.9, silent = TRUE)
# 2)
set.seed(123) 
t(apply(X, 1, function(y){ojaRank(X, y, p = 0.9, silent = TRUE)}))
# In 1) one subsample for all ranks is drawn, whereas in 2)
# a different sample for each rank is drawn.

## End(Not run)

### ----<< Example 2 >>---- : three points in R^3: only one hyperplane
# The following commands return the same result.
## Not run: 
ojaRank(X = diag(rep(1, 3)), x = c(0,0,0))
ojaRank(X = diag(rep(1, 3)), x = c(-100,-110,-5550))
hyperplane(X = diag(rep(1,3)))[1:3]

## End(Not run)


### ----<< Example 3 >>---- : 300 points in R^7
# Subsampling is done.
# The following example might take a bit longer:
## Not run: 
set.seed(123)
X <- rmvnorm(n = 300, mean = rep(0, 7))
system.time(or <- ojaRank(x = 1:7, X = X))
# PLEASE NOTE: The computation of the Oja rank is based on a 
# random sub-sample of less than 1% of all possible hyperplanes.
#
#       user      system     elapsed 
#      18.47        0.00       18.47 
print(or,d=4)
# [1]  7.733  6.613  6.839  7.383 18.237 21.851 23.700

## End(Not run)


### ----<< Example 4 >>---- : univariate ranks
## Not run: 
ojaRank(1:10)
ojaRank(X = 1:10, x = 5.5)

## End(Not run)

OjaNP documentation built on Feb. 23, 2020, 5:07 p.m.