test.spodt: Monte Carlo hypothesis test of the SPODT classification

Description Usage Arguments Value Author(s) References See Also Examples

Description

The test.spodt function provides Monte Carlo hypothesis test of the final classification issued from the spodt function. This function performs simulations of the specified null hypothesis and the classification of each simulated data set, using the same rules than the observed dataset classification.

Usage

1
2
3
test.spodt(formula, data, R2.obs,  rdist, par.rdist, nb.sim, 
                   weight=FALSE, graft=0, level.max=5, min.parent=10, 
				   min.child=5, rtwo.min=0.001)

Arguments

formula

a formula, with a response but no interaction terms. The left hand side has to contain the quantitative response variable. The right hand side should contain the quantitative and qualitative variables to be split according to a non oblique algorithm. For single spatial analysis (with no cofactor) the right hand side should be ~1.

data

a SpatialPointsDataFrame containing the coordinates and the variables. spodt needs planar coordinates. Geographic coordinates have to be projected. Otherwise, euclidian coordinates can be used.

R2.obs

the R2global issued from the previous spodt final classification of the observed dataset. Specified as a numerical value between 0 and 1.

rdist

a description of the distribution of the dependent variable under the null hypothesis. This can be a character string naming a random generation of a specified distribution, such as "rnorm"(Gaussian distribution), "rpois" (Poisson distribution), "rbinom" (binomial distribution), "runif" (uniform distribution) ... .

par.rdist

a list of the parameters needed for the random generation, depending on the null hypothesis distribution, such as c(n,mean,sd) (Gaussian distribution), c(n,lambda) (Poisson distribution), c(n,size,prob) (binomial distribution), c(n,min,max) (uniform distribution) ... .

nb.sim

the number of simulation, specified as a positive integer.

weight

logical value indicating whether the interclass variances should be weighted or not.

graft

if not equals to 0, a numerical value in ]0;1] indicating the minimal modification of R2global requires to grafted the final classes.

level.max

the maximal level of the regression tree above which the splitting algorithm is stopped.

min.parent

the minimal size of a node below which the splitting algorithm is stopped.

min.child

the minimal size of the children classes below which the split is refused and algorithm is stopped.

rtwo.min

the minimal value of R2 above which the node split is refused and algorithm is stopped. Specified as a numerical value between 0 and 1.

Value

The test.spodt function computes classification trees for simulated dataset. It provides the R2global empirical distribution under the null hypothesis, compared to the observed R2global, and a p-value.

Author(s)

Jean Gaudart, Nathalie Graffeo, Guillaume Barbet, Bernard Fichet, Roch Giorgi (Aix-Marseille University)

References

See Also

spodt, spodt.tree, spodtSpatialLines

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
data(dataMALARIA)
#Example : number of malaria episodes per child at each household,
          #from November to December 2009, Bandiagara, Mali.
#Copyright: Pr Ogobara Doumbo, MRTC, Bamako, Mali. email: okd[at]icermali.org
coordinates(dataMALARIA)<-c("x","y")
class(dataMALARIA)
proj4string(dataMALARIA)<-"+proj=longlat +datum=WGS84 +ellps=WGS84"
dataMALARIA<-spTransform(dataMALARIA, CRS("+proj=merc +datum=WGS84 +ellps=WGS84"))

gr<-0.07   #graft parameter
rtw<-0.01 #rtwo.min
parm<-25  #min.parent
childm<-2 #min.child
lmx<-7 

sp<-spodt(dataMALARIA@data[,2]~1, dataMALARIA, weight=TRUE, graft=gr, min.ch=childm,
          min.parent=parm, level.max=lmx, rtwo.min=rtw)

#to test the previous split using Monte-Carlo approach, and hypothesing a
    #Poisson distribution of the dependant variable through the area
test.spodt(dataMALARIA@data[,2]~1, dataMALARIA, sp@R2, "rpois",
           c(length(dataMALARIA@data$loc),mean(dataMALARIA@data$z)), 10,
		   weight=TRUE, graft=gr, level.max=lmx, min.parent=parm,
		   min.child=childm,rtwo.min=rtw)

#the warning "root is a leaf" tells that no split can be provided by the
    #spodt function according to the splitting parameters

SPODT documentation built on May 2, 2019, 9:43 a.m.