lmranks  R Documentation 
Estimation and inference for regressions involving ranks, i.e. regressions in which the dependent and/or the independent variable has been transformed into ranks before running the regression.
lmranks(
formula,
data,
subset,
weights,
na.action = stats::na.fail,
method = "qr",
model = TRUE,
x = FALSE,
qr = TRUE,
y = FALSE,
singular.ok = TRUE,
contrasts = NULL,
offset = offset,
omega = 1,
...
)
## S3 method for class 'lmranks'
plot(x, which = 1, ...)
## S3 method for class 'lmranks'
predict(object, newdata, ...)
## S3 method for class 'lmranks'
summary(object, correlation = FALSE, symbolic.cor = FALSE, ...)
## S3 method for class 'lmranks'
vcov(object, complete = TRUE, ...)
formula 
An object of class " 
data 
an optional data frame, list or environment (or object
coercible by 
subset 
currently not supported. 
weights 
currently not supported. 
na.action 
currently not supported. User is expected to handle NA values prior to the use of this command. 
method 
the method to be used; for fitting, currently only

model, y, qr 
logicals. If TRUE the corresponding components of the fit (the model frame, the response, the QR decomposition) are returned. 
x 

singular.ok 
logical. If 
contrasts 
an optional list. See the 
offset 
this can be used to specify an a priori known
component to be included in the linear predictor during fitting.
This should be 
omega 
real number in the interval [0,1] defining how ties are handled (if there are any). The value of 
... 
For 
which 
As in 
object 
A 
newdata 
An optional data frame in which to look for variables with which to predict. If omitted, the fitted values are used. 
correlation 
logical; if 
symbolic.cor 
logical. If 
complete 
logical indicating if the full variancecovariance matrix
should be returned also in case of an overdetermined system where
some coefficients are undefined and 
This function performs estimation and inference for regressions involving ranks. Suppose there is a dependent variable Y_i
and independent
variables X_i
and W_i
, where X_i
is a scalar and W_i
a vector (possibly including a constant). Instead of running a linear regression of Y_i
on X_i
and W_i
, we want to first transform
Y_i
and/or X_i
into ranks. Denote by R_i^Y
the rank of Y_i
and R_i^X
the rank of X_i
. Then, a
rankrank regression,
R_i^Y = \rho R_i^X + W_i'\beta + \varepsilon_i,
is run using the formula r(Y)~r(X)+W
. Similarly, a regression of
the raw dependent variable on the ranked regressor,
Y_i = \rho R_i^X + W_i'\beta + \varepsilon_i,
can be implemented by the formula Y~r(X)+W
, and a
regression of the ranked dependent variable on the raw regressors,
R^Y_i = W_i'\beta + \varepsilon_i,
can be implemented by the formula r(Y)~W
.
The function works, in many ways, just like lm
for linear regressions. Apart from some smaller details, there are two important differences:
first, in lmranks
, the mark r()
can be used in formulas to indicate variables to be ranked before running the regression and, second,
subsequent use of summary
produces a summary table with the correct standard errors, tvalues and pvalues (while those of the lm
are not correct for
regressions involving ranks). See Chetverikov and Wilhelm (2023) for more details.
Many other aspects of the function are similar to lm
. For instance,
.
in a formula means 'all columns not otherwise in the formula' just as in lm
. An
intercept is included by default.
In a model specified as r(Y)~r(X)+.
, both r(X)
and X
will be
included in the model  as it would have been in lm
and, say,
log()
instead of r()
.
One can exclude X
with a 
, i.e. r(Y)~r(X)+.X
. See
formula
for more about model specification.
The r()
is a private alias for frank
with the increasing
argument set to TRUE
. The omega
argument of frank
specifies how ties in variables are to be handled and
can be supplied as argument in lmranks
. For more details, see frank
. By default omega
is set equal to 1
,
which means r()
computes ranks by transforming a variable through its empirical cdf.
Many functions defined for lm
also work correctly with lmranks
.
These include coef
, model.frame
,
model.matrix
, resid
,
update
and others.
On the other hand, some would return incorrect results if they treated
lmranks
output in the same way as lm
's. The central contribution of this package
are vcov
, summary
and confint
implementations using the correct asymptotic theory for regressions involving ranks.
See the lm
documentation for more.
An object of class lmranks
, inheriting (as much as possible) from class lm
.
Additionally, it has an omega
entry, corresponding to the omega
argument,
a ranked_response
logical entry, and
a rank_terms_indices
 an integer vector with indices of entries of terms.labels
attribute
of terms(formula)
, which correspond to ranked regressors.
plot(lmranks)
: Plot diagnostics for an lmranks
object
Displays plots useful for assessing quality of model fit. Currently, only one plot is available, which plots fitted values against residuals (for homoscedacity check).
predict(lmranks)
: Predict method for Linear Model for Ranks Fits
summary(lmranks)
: Summarizing fits of rankrank regressions
vcov(lmranks)
: Calculate VarianceCovariance Matrix for a Fitted lmranks
object
Returns the variancecovariance matrix of the regression coefficients
(main parameters) of a fitted lmranks
object. Its result is theoretically valid
and asymptotically consistent, in contrast to naively running vcov(lm(...))
.
Sometimes, the data is divided into clusters and one is
interested in running rankrank regressions separately within each cluster, where the ranks are not computed
within each cluster, but using all observations pooled across all clusters. Specifically, let G_i=1,\ldots,n_G
denote
a variable that indicates the cluster to which the ith observation belongs. Then, the regression model of interest is
R_i^Y = \sum_{g=1}^{n_G} 1\{G_i=g\}(\rho_g R_i^X + W_i'\beta_g) + \varepsilon_i,
where \rho_g
and \beta_g
are now clusterspecific coefficients, but the ranks R_i^Y
and R_i^X
are computed as
ranks among all observations Y_i
and X_i
, respectively. That means the rank of an observation is not computed among the other observations
in the same cluster, but rather among all available observations across all clusters.
This type of regression is implemented in the lmranks
command using interaction notation: r(Y)~(r(X)+W):G
. Here, the variable
G must be a factor
.
Since the theory for clustered regression mixing grouped and ungrouped (in)dependent variables is not yet developed, such a model will raise an error.
Also, by default the command includes a clusterspecific intercept, i.e. r(Y)~(r(X)+W):G
is internally interpreted as r(Y)~(r(X)+W):G+G1
.
contrasts
of G
must be of contr.treatment
kind,
which is the default.
As a consequence of the order, in which model.frame
applies operations,
subset
and na.action
would be applied after evaluation of r()
.
That would drop some rank values from the final model frame and returned coefficients
and standard errors could no longer be correct.
The user must handle NA values and filter the data on their own prior to usage in lmranks
.
Wrapping r()
with other functions (like log(r(x))
) will not
recognize correctly the mark (because it will not be caught in terms(formula, specials = "r")
).
The ranks will be calculated correctly, but their transformation will be treated later in lm
as a regular
regressor. This means that the corresponding regression coefficient will be calculated correctly,
but the standard errors, statistics etc. will not.
r
, .r_predict
and .r_cache
are special expressions, used
internally to interpret r
mark correctly. Do not use them in formula
.
A number of methods defined for lm
do not yield theoretically correct
results when applied to lmranks
objects; errors or warnings are raised in those instances.
Also, the df.residual
component is set to NA, since the notion of effects of freedom
for the rank models is not theoretically established (at time of 1.2 release).
Chetverikov and Wilhelm (2023), "Inference for RankRank Regressions", Working Paper
lm
for details about other arguments; frank
.
Generic funcions coef
, effects
,
residuals
,
fitted
, model.frame
,
model.matrix
, update
.
# rankrank regression:
X < rnorm(500)
Y < X + rnorm(500)
rrfit < lmranks(r(Y) ~ r(X))
summary(rrfit)
# naive version of the rankrank regression:
RY < frank(Y, increasing=TRUE, omega=1)
RX < frank(X, increasing=TRUE, omega=1)
fit < lm(RY ~ RX)
summary(fit)
# the coefficient estimates are the same as in the lmranks command, but
# the standard errors, tvalues, pvalues are incorrect
# support of `data` argument:
data(mtcars)
lmranks(r(mpg) ~ r(hp) + ., data = mtcars)
# Same as above, but use the `hp` variable only through its rank
lmranks(r(mpg) ~ r(hp) + .  hp, data = mtcars)
# rankrank regression with clusters:
G < factor(rep(LETTERS[1:4], each=nrow(mtcars) / 4))
lmr < lmranks(r(mpg) ~ r(hp):G, data = mtcars)
summary(lmr)
model.matrix(lmr)
# Include all columns of mtcars as usual covariates:
lmranks(r(mpg) ~ (r(hp) + .):G, data = mtcars)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.