Description Usage Arguments Details Value Author(s) References See Also Examples
Computes a matrix of Hoeffding's (1948) D
statistics for all
possible pairs of columns of a matrix. D
is a measure of the
distance between F(x,y)
and G(x)H(y)
, where F(x,y)
is the joint CDF of X
and Y
, and G
and H
are
marginal CDFs. Missing values are deleted in pairs rather than deleting
all rows of x
having any missing variables. The D
statistic is robust against a wide variety of alternatives to
independence, such as nonmonotonic relationships. The larger the value
of D
, the more dependent are X
and Y
(for many
types of dependencies). D
used here is 30 times Hoeffding's
original D
, and ranges from 0.5 to 1.0 if there are no ties in
the data. print.hoeffd
prints the information derived by
hoeffd
. The higher the value of D
, the more dependent are
x
and y
. hoeffd
also computes the mean and maximum
absolute values of the difference between the joint empirical CDF and
the product of the marginal empirical CDFs.
1 2 3 
x 
a numeric matrix with at least 5 rows and at least 2 columns (if

y 
a numeric vector or matrix which will be concatenated to 
... 
ignored 
Uses midranks in case of ties, as described by Hollander and Wolfe.
Pvalues are approximated by linear interpolation on the table
in Hollander and Wolfe, which uses the asymptotically equivalent
BlumKieferRosenblatt statistic. For P<.0001
or >0.5
, P
values are
computed using a wellfitting linear regression function in log P
vs.
the test statistic.
Ranks (but not bivariate ranks) are computed using efficient
algorithms (see reference 3).
a list with elements D
, the
matrix of D statistics, n
the
matrix of number of observations used in analyzing each pair of variables,
and P
, the asymptotic Pvalues.
Pairs with fewer than 5 nonmissing values have the D statistic set to NA.
The diagonals of n
are the number of nonNAs for the single variable
corresponding to that row and column.
Frank Harrell
Department of Biostatistics
Vanderbilt University
fh@fharrell.com
Hoeffding W. (1948): A nonparametric test of independence. Ann Math Stat 19:546–57.
Hollander M. and Wolfe D.A. (1973). Nonparametric Statistical Methods, pp. 228–235, 423. New York: Wiley.
Press WH, Flannery BP, Teukolsky SA, Vetterling, WT (1988): Numerical Recipes in C. Cambridge: Cambridge University Press.
1 2 3 4 5 6 7 8 9 10 11 12 13 
Loading required package: lattice
Loading required package: survival
Loading required package: Formula
Loading required package: ggplot2
Attaching package: 'Hmisc'
The following objects are masked from 'package:base':
format.pval, round.POSIXt, trunc.POSIXt, units
D
x y z q
x 1 0 NA 1
y 0 1 NA 0
z NA NA 1 NA
q 1 0 NA 1
avgF(x,y)G(x)H(y)
x y z q
x 0.00 0.04 0 0.16
y 0.04 0.00 0 0.04
z 0.00 0.00 0 0.00
q 0.16 0.04 0 0.00
maxF(x,y)G(x)H(y)
x y z q
x 0.00 0.1 0 0.24
y 0.10 0.0 0 0.10
z 0.00 0.0 0 0.00
q 0.24 0.1 0 0.00
n
x y z q
x 5 5 4 5
y 5 5 4 5
z 4 4 4 4
q 5 5 4 5
P
x y z q
x 0.3633 0.0000
y 0.3633 0.3633
z
q 0.0000 0.3633
D
x y
x 1.00 0.06
y 0.06 1.00
avgF(x,y)G(x)H(y)
x y
x 0.0000 0.0407
y 0.0407 0.0000
maxF(x,y)G(x)H(y)
x y
x 0.0000 0.0763
y 0.0763 0.0000
n= 200
P
x y
x 0
y 0
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.