rankor: Critical values and p-values of some rank correlations

Description Usage Arguments Details Value Note Author(s) References Examples

View source: R/rankcor.R

Description

Computes rank correlations and evaluates exact and estimated p-values under the null hypothesis of no association. If there are no ties, all the tests are exact, at least within the limitations imposed by the complete enumeration of all permutations. When there are ties, the function offers two solutions as described below.

Usage

1
2
rankor(p, q, index = "spearman", approx = "exact", tiex = "woodbury", CC = FALSE, 
	type = "two-sided", print = TRUE, sizer = 100000, repgin = 1000)

Arguments

p

a numeric vector.

q

a numeric vector with compatible dimension to p.

index

a character string that specifies the rank correlation that is used in the test statistic. Acceptable values are: "spearman","kendall","gini", "r4", "fy1" (Fisher-Yates based on means), "fy2" (Fisher-Yates based on medians) and "sbz" (Symmetrical Borroni-Zenga). Only enough of the string to be unique is required.

approx

a character string that specifies the type of approximation to the null distribution. Acceptable values are: "vggfr", "exact","gaussian" and "student". Only enough of the string to be unique is required. Exact computations use frequencies precomputed and stored in specific look-up tables.

tiex

a character string that specifies the method for breaking ties. Possible options are: "woodbury", "gh","wgh", "midrank", "dubois". Only enough of the string to be unique is required. Ignored if there are no equal scores.

CC

if true, a continuity correction is applied.

type

type of alternative hypothesis. The options are "two-sided" (default), "greater" or "less". Only enough of the string to be unique is required; "greater" corresponds to positive association, "less" to negative association.

print

FALSE suppresses some of the output.

sizer

number of replications for resolving ties by randomization. Default value: 10^5. Use of this specification is recommended for applying the Woodury method to the Gini index.

repgin

number of sampled pairs of permutations needed to apply the weighted max-min method (wgh). Default value: 10^3.

Details

The presence of ties is assessed through two approaches. (i) Use the fact that for any given number of ties of each multiplicity the mean and the variance of the statistics will tend towards that of data without any ties as n increases. Consequently, the standard Gaussian distribution can be used after resolving ties by the method of mid-rank or the method of Dubois. (ii) Determine a synthesis of the values of the given statistic over all possible ways in which ranks could have been caused by truly differing observations. This solution enables the execution of rank correlation tests by using the usual procedure applied in the absence of ties. See the description of comprank.

It can be noted that each one of r_h, h=1,2,3 includes a quantity S_h, which is always an integer. It is possible to add a continuity correction for r_h, h=1,2,3 to adjust the fact that we are approximating a discrete distribution with a continuous one. Kendall and Dickinson Gibbons (1990) [p. 65] improved the approximation of r_2 by subtracting one from the observed S_2 if it is positive or adding one if it is negative. Also, Kendall and Dickinson Gibbons (1990) [p. 70] suggested subtracting one to S_1 if it is less than (n^3-n)/6 and adding one if it exceeds (n^3-n)/6. The correction to S_3 can follow the same scheme as that of S_1. The formulation of coefficients leads to the recommendation of not using the correction for continuity with r_4, r_5 and r_6. Furthermore, Iman and Conover (1978) point out that the continuity correction should not te considered with ties present.

If p<0.5 the use of the continuity correction decreases the p-value, but without the correction the calculations are slanted in favor of not rejecting the null hypothesis. If n is sufficiently large, the effect of the continuity correction is minimal. In this regard, Haber (1982) notes that, while it is agreed that continuity correction usually improves the approximation to the cumulative distribution function of a discrete random variable the use of continuity correction when performing a statistical test is still an issue of major controversy. See, for example, Kruskal & Wallis (1952).

Value

A list containing the following components:

n

number of ranks

statistic

type of coefficient of rank correlation

r

observed value of the coefficient

approx

type of approximation

tails

the type of alternative

Cpv

approximate or exact conservative p-value

Lpv

approximate or exact or liberal p-value

Note

The approximations to the exact null distribution of therank correlations statistics considered in pvrank are discussed with regard the function "ranktes".

Author(s)

Agostino Tarsitano, Ilaria Lucrezia Amerise, Marco Marozzi

References

Haber, M. (1982). "The continuity correction and statistical testing". International Statistical Review, 50, 135–144.

Iman, R. L. and Conover, W. J. (1978). "Approximations of the critical region for spearman's rho with and without ties present". Communications in Statistics-Simulation and Computation, 7, 269-282.

Kendall, M. and Dickinson Gibbons, J. (1990). Rank Correlation Methods, 5th ed., Oxford University Press, New York.

Kruskal, W.H. and Wallis, A. (1952). "Use of ranks in one-criterion variance analysis". Journal of the American Statistical Association, 47, 583–621.

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
	
data(Lambh);attach(Lambh)
rankor(DVI,Temp,"spearman","ga","gh",FALSE,"less",TRUE)
rankor(DVI,Temp,"r4","ga","gh",FALSE,"less",TRUE)
rankor(DVI,Temp,"fy1","ga","gh",FALSE,"less",TRUE)
detach(Lambh)
#####
#
data(Security);attach(Security)
rankor(AP,OV,"sp","ga","gh",FALSE,"greater",TRUE)
rankor(MP,AP,"fy2","ga","gh",FALSE,"greater",TRUE)
rankor(OV,MP,"sbz","ga","gh",FALSE,"greater",TRUE)
detach(Security)
#####
#
data(Dizytwin);attach(Dizytwin)
op<-par(mfrow=c(1,1), mgp=c(1.8,.5,0), mar=c(2.8,2.7,2,1),oma=c(0,0,0,0))
plot(Latitude,DZT_Rate,main="Latitude and dizygotic twinning rates",xlab="Latitude",
ylab="DZT_Rate", pch=19,cex=0.9, col= "tan4")
text(Latitude,DZT_Rate,labels=rownames(Dizytwin),cex=0.6,pos=c(rep(3,10),1,3,1,
rep(3,4),1.3))
abline(v=mean(Latitude),col="black",lty=2,lwd=1)
abline(h=mean(DZT_Rate),col="darkblue",lty=2,lwd=1)
par(op)
rankor(Latitude,DZT_Rate,"r4","vg","gh",FALSE,"two",TRUE)
rankor(Latitude,DZT_Rate,"sp","ex","gh",FALSE,"two",TRUE)
rankor(Latitude,DZT_Rate,"ke","ex","gh",FALSE,"two",TRUE)
detach(Dizytwin)	
#####
#
# Bland, J.M. and Mutoka, C. and Hutt, M.S.R. ``Kaposi's sarcoma in Tanzania". East African
# Journal of Medical Research, 4, 47--53 (1977).
# Data from a study of the geographical distribution of a tumor, Kaposi's sarcoma, in
# mainland Tanzania.
Region<-c("Tabora","Iringa","Coast","Mara","Ruvuma","Mbeya","Shinyanga","Kigoma",
"Singida","Dodoma","Morogoro","Westlake","Arusha","Mtwara","Mwanza","Tanga",
"Kilimanjara")
Popw10kHF<-c(1.8, 2.6, 4.0, 4.4, 6.6, 6.7, 9.0, 9.2, 10.8, 11.1, 11.7, 12.5, 13.7, 14.8,
20.7, 23.0, 57.3)
CasePMY<-c(2.37, 8.46, 1.28, 4.29, 7.21, 2.06, 1.66, 4.22, 6.17, 2.6, 6.33, 6.6, 2.46,
6.4, 8.54, 4.54, 6.65)
op<-par(mfrow=c(1,1), mgp=c(1.8,.5,0), mar=c(2.8,2.7,2,1),oma=c(0,0,0,0))
plot(Popw10kHF,CasePMY,main="",pch=19,cex=0.9,cex.lab=0.9,cex.axis=0.8,col="navy")
text(Popw10kHF,CasePMY,labels=Region,pos=3,cex=0.7)
abline(v=mean(Popw10kHF),col="black",lty=2,lwd=1)
abline(h=mean(CasePMY),col="darkblue",lty=2,lwd=1)
par(op)
rankor(CasePMY,Popw10kHF,"sp","ga","dubois",FALSE,"greater",TRUE)
#####
#

data(Berk);attach(Berk)
op<-par(mfrow=c(1,1), mgp=c(1.8,.5,0), mar=c(2.8,2.7,2,1),oma=c(0,0,0,0))
plot(Births,Deaths,main="",pch=19,cex=0.9,cex.lab=0.9,cex.axis=0.8)
abline(h=mean(Deaths),col="black",lty=2,lwd=1)
abline(v=mean(Births),col="black",lty=2,lwd=1)
text(Births[12],Deaths[12],labels="noon",pos=3,cex=0.7)
text(Births[24],Deaths[24],labels="midnight",pos=4,cex=0.7)
W<-Births[-c(12,24)];Z<-Deaths[-c(12,24)]
plot(W,Z,main="",xlab="Births",ylab="Deaths",pch=19,cex=0.99,cex.lab=0.99,
cex.axis=0.8)
text(W,Z,labels=paste("h",1:24,sep=""),cex=0.6, pos=
c(1,3,4,1,1,1,4, 2,1,1,1,1,1,1,1,1,3,1,1,1,2,1,4,2) )
abline(h=mean(Z),col="black",lty=2,lwd=1)
abline(v=mean(W),col="black",lty=2,lwd=1)
par(op)
A<-matrix(NA,10,5);Series<-c("Complete","Clean")
colnames(A)<-c("Data set","Coeff.","Value", "C. Two-tail p","L. Two-tail p")
a0<-cor.test(Births,Deaths, method = "pearson", alternative = "t")
k<-1;A[k,1]<-Series[1];A[k,2]<-"pearson";A[k,3]<-round(a0$estimate,4)
A[k,4]<-round(a0$p.value,5);A[k,5]<-round(a0$p.value,5)
for (j in c("spearman","kendall","gini","r4")){k<-k+1
		a<-rankor(Births,Deaths,j,"st","gh",FALSE,"two",FALSE)
		A[k,1]<-Series[1];A[k,2]<-j;A[k,3]<-round(a$Value,4);A[k,4]<-round(a$Cpv,5)
		A[k,5]<-round(a$Lpv,5)}
a1<-cor.test(W,Z, method = "pearson", alternative = "t")		
k<-k+1;A[k,1]<-Series[2];A[k,2]<-"pearson";A[k,3]<-round(a1$estimate,4)
A[k,4]<-round(a1$p.value,5);A[k,5]<-round(a1$p.value,5)
for (j in c("spearman","kendall","gini","r4")){k<-k+1
		a<-rankor(W,Z,j,"st","wgh",FALSE,"two",FALSE)
		A[k,1]<-Series[2];A[k,2]<-j;A[k,3]<-round(a$Value,4);A[k,4]<-round(a$Cpv,5)
		A[k,5]<-round(a$Lpv,5)}
A<-as.data.frame(A)
print(A,print.gap=4,right=FALSE)
detach(Berk)

#####
#

data(BlaAlt);attach(BlaAlt)
op<-par(mfrow=c(1,1))
plot(Fev1,Fev2,main="",xlab="First measurement of Fev",ylab="Second measurement of Fev",
pch=19,cex=0.9, col= "gold2" )
abline(v=mean(Fev1),col="black",lty=2,lwd=1)
abline(h=mean(Fev2),col="darkblue",lty=2,lwd=1)
par(op)
rankor(Fev1,Fev2,"sp","ga","woodbury",FALSE,"two",TRUE)
rankor(Fev1,Fev2,"sp","ga","midrank",FALSE,"two",TRUE)
rankor(Fev1,Fev2,"sp","ga","dubois",FALSE,"two",TRUE)
rankor(Fev1,Fev2,"sp","ga","gh",FALSE,"two",TRUE)
rankor(Fev1,Fev2,"sp","ga","wgh",FALSE,"two",TRUE)
detach(BlaAlt)

#####
#
data(Locre);attach(Locre)
op<-par(mfrow=c(1,1))
plot(Males,Females,main="Fer cryin' out loud - there is a sex difference",
xlab="Females",ylab="Males",pch=19,cex=0.8,col="steelblue")
text(Males,Females,labels=1:length(Females),cex=0.7,pos=2)
abline(h=mean(Females),col="black",lty=2,lwd=1)
abline(v=mean(Males),col="black",lty=2,lwd=1)
par(op)
out<-rankor(Males,Females,"gi","vg","gh",FALSE,"greater")
cat(out$Value,out$Cpv,"\n")
detach(Locre)
#####
#

# Relationship between the size of caudolateral curvilinear osteophyte
# of the canine femoral neck and the radiographic view
data(Femurs);attach(Femurs)
cos<-ifelse(Gender==1,"steelblue","pink4")
txs<-ifelse(Gender==1,"F","M")
op<-par(mfrow=c(1,2))
	plot(Age,CCO,main="Plot_1",
	xlab="Age",ylab="CCO",pch=19,cex=0.7,col=cos)
	text(Age,CCO,labels=txs,cex=0.6,pos=2)
	abline(h=mean(CCO),col="black",lty=2,lwd=1)
	abline(v=mean(Age),col="black",lty=2,lwd=1)
	plot(BW,CCO,main="Plot_2",
	xlab="Body weight",ylab="CCO",pch=19,cex=0.7,col=cos)
	text(BW,CCO,labels=txs,cex=0.6,pos=2)
	abline(h=mean(CCO),col="black",lty=2,lwd=1)
	abline(v=mean(BW),col="black",lty=2,lwd=1)
par(op)
out<-rankor(Age,CCO,"gi","st","gh",FALSE,"two")
cat(out$Value,out$Cpv,"\n")
out<-rankor(BW,CCO,"fy1","st","gh",FALSE,"two")
detach(Femurs)

#####
#

data(FrigateB);attach(FrigateB)
op<-par(mfrow=c(1,1))
plot(Vol,Frq,pch=19,col="darkgreen")
abline(h=mean(Frq),col="black",lty=2,lwd=1)
abline(v=mean(Vol),col="black",lty=2,lwd=1)
par(op)
Evar<-function(A){
	index<-c("sp", "ke", "gi", "r4", "fy1", "fy2", "sbz")
	approx<-c("ex","ga","st","vg")
	for (ind in index){for(app in approx){rankor(A[,1],A[,2],ind,app,"wgh",FALSE,
			                   "less",TRUE)}}}
Evar(FrigateB)
detach(FrigateB)

#####
#

All.index<-function(X,type){
# Computes the observed rank correlation statistics and their p-values.
	Index<-c("spearman", "kendall", "gini", "r4", "fy1", "fy2", "sbz")
	n<-nrow(X);A<-matrix(0,6,6)
	colnames(A)<-c(" Observed","  t-Student","  Gaussian","  VGGFR",
	"Exact Cons."," Exact Lib.")
	rownames(A)<-c("Spearman","Kendall ","Gini ","   r4", "FY means"," FY  medians", "Symm. BZ")
	for (i in 1:7){
		a<-rankor(X[,1],X[,2],Index[i],"St","woodbury",FALSE,type,FALSE)
		r<-a$Value;A[i,2]<-a$Cpv
		a<-ranktes(r,n,Index[i],"Ga",FALSE,type,FALSE);A[i,3]<-a$Cpv
		a<-ranktes(r,n,Index[i],"Vg",FALSE,type,FALSE);A[i,4]<-a$Cpv
		a<-ranktes(r,n,Index[i],"Ex",FALSE,type,FALSE);A[i,5]<-a$Cpv;A[i,6]<-a$Lpv
		A[i,1]<-r}
return(A)}
# Data for the sample run is from Sokal and Rohlf (Box 15.6, 1981; 
# or Box 15.7, 1995): Computation of rank correlation coefficient between
# the total length of 15 aphid stem mothers and the mean thorax length of 
# their parthenogenetic offspring.
X<-matrix(c(8.7,5.95,8.5,5.65,9.4,6,10,5.7,6.3,4.7,7.8,5.53,11.9,6.4,6.5,4.18,6.6,
6.15,10.6, 5.93,10.2,5.7,7.2,5.68, 8.6,6.13,11.1,6.3, 11.6,6.03), 15, 2, byrow=TRUE) 
A<-All.index(X,"two-sided") # type of alternative
print(round(A,4))

pvrank documentation built on May 17, 2018, 9:03 a.m.

Related to rankor in pvrank...