Measuring Habitat Selection Using the Method of Engen et al. (2008)
Description
These functions implements the method described by Engen et al. to measure the preference of animals for habitat variables in habitat selection studies.
Usage
1 2 3 4 5 6 7 8 9  engen2008II(us, av, id, nsim = 500, nsimra = 500)
engen2008I(us, av, nsimra=500)
## S3 method for class 'engenetalI'
print(x, ...)
## S3 method for class 'engenetalII'
print(x, ...)

Arguments
us 
a data frame containing the value of numeric habitat variables (columns) in each site (rows) used by the animals. 
av 
a data frame containing the value of numeric habitat variables (columns) in each site (rows) available to the animals. 
id 
a factor with as many elements as there are rows in

nsim 
the number of randomizations used in the calculation of the total variance. 
nsimra 
the number of random allocation of ranks used in the calculation of the normal score (see details). 
x 
an object of class 
... 
additional arguments to be passed to other functions (currently unused) 
Details
Engen et al. (2008) proposed an original approach to measure the
preference of animals for values of each particular variable of a
multivariate set of environmental variables. Their approach was
originally developed for the case where there is a sample of used site
is for each animal in a sample of identified animals (e.g. using
radiotelemetry or GPS), with several sites per animal (i.e., design II
according to the classification of Thomas and Taylor, 1990). However,
we extended this approach to also include the case where habitat use
is described by a sample of used site, with one site per unidentified
animal (i.e., design I).
The original approach is the following: first, a normal score
transformation of each habitat variable is performed: for each
variable, the empirical cumulative distribution is computed, by
dividing the rank of the value of each available site by the number of
observations. Note that the ties are ranked randomly. Then, the
inverse of the standard normal integral (see ?qnorm
) of the
cumulative distribution function is computed for all available sites:
this results into a perfectly normal distribution of the habitat
variables for the available sites. Then, the value of the cumulative
distribution – estimated from the available sites – is computed for
each used site. Then, the inverse of the standard normal integral is
computed for each one.
Engen et al. (2008) suppose the following model describing how habitat use results from habitat availability. Let
Z_{ij}
be the value of a given habitat variable (transformed according to the normal score) for the jth site used by the ith animal. Then this value can be described by the model:
Z_{ij} = mu + U_i + V_{ij}
where
mu
is the preference for the habitat variable (0 indicates a nonpreference),
U_i
and
V_{ij}
are normal distributions with means equal to zero and variances equal to
sigma^2
and
tau^2
respectively. Engen et al. give fomula for the estimation of these parameters. Their estimation is done by first estimating the total variance
sigma^2 + tau^2
(this variance is estimated
by sampling randomly one observation per animal – the parameter
nsim
controls the number of samples used in this computation;
see Engen et al. 2008). Note that the correlation between the value
observed for two used units sampled from all the units used by a given
animal is
rho = sigma^2 / (sigma^2 + tau^2)
. A large value of rho indicates a large variation
in the habitat used between animals (or a small withinanimal
variation). The main parameter of concern is here the preference.
The function engen2008II
allows to estimate these
parameters.
The function engen2008I
extends this model for design I
studies (a sample of used sites and a sample of available sites,
animals not identified), by considering the following model for these
studies:
Z_{ij} = mu + V_{ij}
where
mu
is the preference for the habitat variable (0 indicates a nonpreference), and
V_{ij}
are normal distributions with means equal to zero and variances equal to
sigma^2
.
Note that the habitat variables may be correlated on the study area. In this case, observed preference for a given variable may be an artefact of other variables prefered by animals. Supposing that the data.frame containing the
Z_{ij}
's is a realization of a multivariate normal distribution, we can compute, for each habitat variable and each used site, the *conditional* mean
m_{ij}
and *conditional* standard deviation
s_{ij}
of this variable at this site, *given* the values of the other habitat variables at this site (this is done using the algorithm described by Ripley, 1987, p.98). We then compute the standardized values
P_{ij} = Z_{ij}  m_{ij})/s_{ij}
. The preference is then computed using these
standardized values.
Because there may be ties in the distribution of values of habitat
variables, the results may vary depending on the random order chosen
for ties when computing the normal scores. Engen et al. recommended
to repeat the function a large number of times, and to use the mean
values as estimates of the parameters. This is what the function
does, and the number of randomization is controlled by the parameter
nsimra
.
Note that all these methods rely on the following hypotheses: (i) independence between animals, (ii) independence between sites, and (iii) all animals are selecting habitat in the same way (in addition to "traditional" hypotheses in these kinds of studies: no territoriality, all animals having equal access to all available resource units, etc., see Manly et al. 2002 for further details).
That the examples below provide an illustration and discussion of interesting and (at first sight) surprising properties of this method.
Value
engen2008I
returns a list of class engenetalI
, and
engen2008II
returns a list of class engenetalII
. Both
types of list contain two elements:
raw 
this is a list containing one data frame per habitat
variable, containing the value of the correlation rho (for

results 
a data frame containing the mean values over all the randomizations, of these parameters (columns) for each habitat variable (rows). 
Note
Be patient! these functions can be very long (depending on the number of
sites and on the value of simra
)
Author(s)
Clement Calenge clement.calenge@oncfs.gouv.fr
References
Engen, S., Grotan, V., Halley, D. and Nygard, T. (2008) An efficient multivariate approach for estimating preference when individual observations are dependent. Journal of Animal Ecology, 77: 958–965.
Thomas, D. and Taylor, E. (1990) Study designs and tests for comparing resource use and availability. Journal of Wildlife Management, 54, 322–330.
Ripley, B. (1987) Stochastic Simulation. John Wiley and Sons.
See Also
niche
, madifa
,
gnesfa
for another approach to tackle the study of
habitat selection. For categorical variables, see
kselect
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133  ## Not run:
#################################
#################################
#################################
## Practical use of engen2008II
data(puechabon)
kasc < puechabon$kasc
## Removes the aspect (no factor allowed in the function)
kasc$Aspect < NULL
## engen2008II:
avail < kasc2df(kasc)$tab
use < join.kasc(puechabon$locs[,c("X","Y")], kasc)
id < puechabon$locs$Name
## This function can be very long:
engen2008II(use, avail, id, nsimra=10)
## Practical use of engen2008I
data(lynxjura)
ka < lynxjura$map
lo < lynxjura$locs[,1:2]
av < kasc2df(ka)$tab
us < join.kasc(lo, ka)
## Idem, be patient here:
engen2008I(us, av, nsimra=10)
#################################
#################################
#################################
##
## For a deeper discussion on
## this method... a simulation:
#################################
##
## First, simulation of a dataset
## copy and paste this part into R,
## but skip the reading of the
## comments if you are not interested
## into this simulation
## simulate the available points
set.seed(235)
av < cbind(rnorm(1000, mean=0, sd=3), rnorm(1000, mean=0, sd=0.5))
tt < cbind(c(cos(pi/4),sin(pi/4)),c(cos(pi/4), sin(pi/4)))
av < as.data.frame(as.matrix(av)%*%tt)
## simulate the used points: we simulate a selection on the first
## principal component of the PCA of the data.frame describing the
## availability. In other words, we simulate the case where the
## habitat selection occurs on the "common part" of the two habitat
## variable (no preference for one particular variable).
us < do.call("rbind", lapply(1:5, function(i) {
us1 < cbind(rnorm(30, mean=rnorm(1, 4, 1), sd=0.5),
rnorm(30, mean=rnorm(1, 0, 0.5), sd=0.2))
return(us1%*%tt)
}))
colnames(us) < colnames(av) < c("var1", "var2")
id < gl(5,30)
#################################
##
## Study of the habitat selection on these data
## The data are:
##  us: a matrix containing the used sites for two
## habitat variables
##  av: a matrix containing the available sites for two
## habitat variables
##  id: a vector containing the id of 5 animals
## First illustrate the use and availability of the two variables:
plot(av, xlab="Habitat variable 1", ylab="Habitat variable 2",
col="grey", pch=16)
lius < split(as.data.frame(us), id)
junk < lapply(1:5, function(i) points(lius[[i]], pch=16, col=i))
## > ***It is very clear that there is a selection***:
## the animals select the low values of both habitat variables.
## (this is what we actually simulated
## Now perform the method of Engen et al. (2008):
engen2008II(us, av, id)
## Surprisingly, the method seems to fail to identify the clear
## habitat selection identified graphically...
##
## In fact, it does not fail:
## this method identifies the part of habitat selection that is clearly
## attributable to a given variable. Here the animals select the
## the common factor expressed in the two variables, and it is impossible
## to identify whether the selection is due only to the variable 1 or to
## the variable 2: it is caused by both variable simultaneously.
## Once the selection on the variable 2 (including the common part)
## has been removed, there is no longer appearant selection on
## variable 1. Once the selection caused by the variable 1
## (including the common part) has been removed, there is no
## longer selection on variable 2...
##
## For this reason, Engen et al. recommended to use this method
## concurrently with other factor analyses of the habitat selection
## such as madifa, kselect, niche (in ade4 package), etc.
##
## Note also the strong correlation between the value of two random
## points used by a given animal. This indicates a strong variability
## among animals...
## End(Not run)
