regdistbetweenone: Testing equality of one within-group and between-two groups...

View source: R/reg2cluster2dist.R

regdistbetweenoneR Documentation

Testing equality of one within-group and between-two groups distances regression

Description

Jackknife-based test for equality of two regressions between distances. Given two groups of objects, this tests whether the regression involving the distances within one of the groups is compatible with the regression involving the same within-group distances together with the between group distances.

Usage

regdistbetweenone(dmx,dmy,grouping,groups=levels(as.factor(grouping))[1:2],rgroup)

Arguments

dmx

dissimilarity matrix or object of class dist. Explanatory dissimilarities (often these will be proper distances, but more general dissimilarities that do not necessarily fulfill the triangle inequality can be used, same for dmy).

dmy

dissimilarity matrix or object of class dist. Response dissimilarities.

grouping

something that can be coerced into a factor, defining the grouping of objects represented by the dissimilarities dmx and dmy (i.e., if grouping has length n, dmx and dmy must be dissimilarities between n objects).

groups

vector of two levels. The two groups defining the regressions to be compared in the test. These can be factor levels, integer numbers, or strings, depending on the entries of grouping.

rgroup

one of the levels in groups, denoting the group of which within-group dissimilarities are considered.

Details

The null hypothesis that the regressions based on the distances within group species and based on these distances together with the between-groups distances are equal is tested using jackknife pseudovalues. The test statistic is the difference between fitted values with x (explanatory variable) fixed at the center of the between-group distances. The test is run one-sided, i.e., the null hypothesis is only rejected if the between-group distances are larger than expected under the null hypothesis, see below. For the jackknife, observations from both groups are left out one at a time. However, the roles of the two groups are different (observations from group species are used in both regressions whereas observations from the other group are only used in one of them), and therefore the corresponding jackknife pseudovalues can have different variances. To take this into account, variances are pooled, and the degrees of freedom of the t-test are computed by the Welch-Sattertwaithe approximation for aggregation of different variances.

The test cannot be run and many components will be NA in case that within-group regressions or jackknifed within-group regressions are ill-conditioned.

This was implemented having in mind an application in which the explanatory distances represent geographical distances, the response distances are genetic distances, and groups represent species or species-candidates. In this application, for testing whether the regression patterns are compatble with the two groups behaving like a single species, one would first use regeqdist to test whether a joint regression for the within-group distances of both groups makes sense. If this is not rejected, regdistbetween is run to see whether the between-group distances are compatible with the within-group distances. If a joint regression on within-group distances is rejected by regeqdist, regdistbetweenone can be used to test whether the between-group distances are at least compatible with the within-group distances of one of the groups, which can still be the case within a single species, see Hausdorf and Hennig (2019). This is only rejected if the between-group distances are larger than expected under equality of regressions, because if they are smaller, this is not an indication against the groups belonging together genetically. To this end, regdistbetweenone needs to be run twice using both groups as species. This will produce two p-values. The null hypothesis that the regressions are compatible for at least one group can be rejected if the maximum of the two p-values is smaller than the chosen significance level.

Value

list of class "regdistbetween" with components

pval

p-value.

coeffdiff

difference between regression fits (within-group together with between-groups distances minus within-group distances only) at xcenterbetween, see below.

condition

condition numbers of regressions, see kappa.

lmfit

list. Output objects of lm within the two groups.

jr

output object of jackknife for difference between regression fitted values at xcenterbetween.

xcenter

mean of within-group distances for group species of explanatory variable, used for centering.

xcenterbetween

mean of between-groups distances of explanatory variable (after centering by xcenter); at this point regression fitted values are computed.

tstat

t-statistic.

tdf

degrees of freedom of t-statistic according to Welch-Sattertwaithe approximation.

jackest

jackknife-estimator of difference between regression fitted values at xcenterbetween.

jackse

jackknife-standard error for jackest.

jackpseudo

vector of jacknife pseudovalues on which the test is based.

groups

see above.

species

see above.

testname

title to be printed out when using print.regdistbetween.

Author(s)

Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en

References

Hausdorf, B. and Hennig, C. (2019) Species delimitation and geography. Submitted.

See Also

regeqdist, regdistbetweenone

Examples

  options(digits=4)
  data(veronica)
  ver.geo <- coord2dist(coordmatrix=veronica.coord[173:207,],file.format="decimal2")
  vei <- prabinit(prabmatrix=veronica[173:207,],distance="jaccard")

  species <-c(rep(1,13),rep(2,22))
  loggeo <- log(ver.geo+quantile(as.vector(as.dist(ver.geo)),0.25))
  rtest3 <-
  regdistbetweenone(dmx=loggeo,dmy=vei$distmat,grouping=species,groups=c(1,2),rgroup=1)
  print(rtest3)

prabclus documentation built on Sept. 24, 2024, 5:07 p.m.