ojaRank | R Documentation |
The function computes the Oja rank of a point x
w.r.t. a data set X
or, if no
point x
is given, the Oja ranks of all points in X
.
ojaRank(X, x = NULL, p = NULL, silent = FALSE, na.action = na.fail)
X |
numeric data.frame or matrix containing the data points as rows. |
x |
|
p |
|
silent |
logical, if subsampling is done or the expected computation time is too long, a warning message will be printed unless |
na.action |
a function which indicates what should happen when the data contain 'NA's. Default is to fail. |
The function computes the Oja rank of the point x
w.r.t.\ the data set X
or, if no x
is specified, the Oja ranks of all data points in
X
w.r.t. X
. For a definition of Oja rank see reference below.
The matrix X
needs to have at least as many rows as columns in order to give sensible results.
The vector x
has to be of length ncol(X)
.
If x
is specified, a vector of length ncol(X)
is returned. Otherwise the return value is a matrix of the same dimensions as X
where the i
-th row contains the
Oja rank of the i
-th row of X
.
The function will also work for matrices X
with only one column and also vectors. Then centered and normalized (univariate) ranks are returned.
For n = nrow(X)
data points in R^k
, where k = ncol(X)
, the computation of the Oja rank necessitates the evaluation of N = choose(n,k)
hyperplanes in R^k
.
Thus for large data sets the function offers a subsampling option in order to deliver (approximate) results within reasonable time. The subsampling fraction is controlled by the parameter
p
:
If p < 1
is passed to the function, the computation is based on a random sample of only pN
of all possible N
hyperplanes. If p
is not specified
(which defaults to p = NULL
), it is automatically determined based on n
and k
to yield a sensible trade-off between accuracy and computing time. If
N k^3 < 6 \cdot 10^6
, the sample fraction p
is set to 1 (no subsampling). Otherwise p
is choosen such that the computation
(of one rank) usually takes around 20 seconds (on a 1.66 GHz CPU and 1 GB RAM). If all Oja ranks of X
are requested, a hyperplane sample is drawn once,
all Oja ranks are then computed based on this sample.
Finally, subsampling is feasible. Even for very small p
useable results can be expected, see e.g. the examples for the function ojaRCM
.
Claudia Köllmann is acknowledged for bug-fixing this function.
either a numeric vector, the Oja rank of x
, or
a matrix of the same dimensions as X
containing the Oja ranks of X
as rows.
Daniel Vogel, Jyrki Möttönen
Fischer D, Mosler K, Möttönen J, Nordhausen K, Pokotylo O and Vogel D (2020). “Computing the Oja Median in R: The Package OjaNP.” Journal of Statistical Software, 92(8), pp. 1-36. doi: 10.18637/jss.v092.i08 (URL: http://doi.org/10.18637/jss.v092.i08).
Oja, H. (1999), Affine invariant multivariate sign and rank tests and corresponding estimates: A review, Scand. J. Statist., 26, 319–343.
ojaSign
, ojaRCM
, hyperplane
, ojaSignedRank
### ----<< Example 1 >>---- : 30 points in R^2
## Not run:
set.seed(123)
X <- rmvnorm(n = 30,mean = c(0,0)) # from package 'mvtnorm'
ojaRank(X)
ojaRank(X, x = c(100,100))
ojaRank(X, x = ojaMedian(X, alg="exact")) # close to zero
# The following two return the same (only in different time)
ojaRank(X)
t(apply(X, 1, function(y){ojaRank(X,y)}))
# but the following two do not (due to different subsampling).
# 1)
set.seed(123); ojaRank(X, p = 0.9, silent = TRUE)
# 2)
set.seed(123)
t(apply(X, 1, function(y){ojaRank(X, y, p = 0.9, silent = TRUE)}))
# In 1) one subsample for all ranks is drawn, whereas in 2)
# a different sample for each rank is drawn.
## End(Not run)
### ----<< Example 2 >>---- : three points in R^3: only one hyperplane
# The following commands return the same result.
## Not run:
ojaRank(X = diag(rep(1, 3)), x = c(0,0,0))
ojaRank(X = diag(rep(1, 3)), x = c(-100,-110,-5550))
hyperplane(X = diag(rep(1,3)))[1:3]
## End(Not run)
### ----<< Example 3 >>---- : 300 points in R^7
# Subsampling is done.
# The following example might take a bit longer:
## Not run:
set.seed(123)
X <- rmvnorm(n = 300, mean = rep(0, 7))
system.time(or <- ojaRank(x = 1:7, X = X))
# PLEASE NOTE: The computation of the Oja rank is based on a
# random sub-sample of less than 1% of all possible hyperplanes.
#
# user system elapsed
# 18.47 0.00 18.47
print(or,d=4)
# [1] 7.733 6.613 6.839 7.383 18.237 21.851 23.700
## End(Not run)
### ----<< Example 4 >>---- : univariate ranks
## Not run:
ojaRank(1:10)
ojaRank(X = 1:10, x = 5.5)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.