Description Usage Arguments Details Value Note References Examples
threephase
is used to calculate estimations based on triple sampling under the
model-assisted Monte Carlo approach. A zero phase of auxiliary information
(e.g. taken from remote sensing data) is used to generate model predictions based on multiple linear
regression using the method of ordinary least squares. A subsample of the zero phase comprises
further auxiliary information that produces another set of model predictions.
A further subsample produces a second phase based on terrestrial observations
(i.e. the local densities of the ground truth) and is used to correct for bias in the design-based sense.
The estimation method is available for simple and cluster sampling and includes
the special case where the first phase is based on an exhaustive sample (i.e. a census).
Small-area applications are supported for synthetic estimation as well as two varieties
of bias-corrected estimators: the traditional small-area estimator and an asymptotically
equivalent version derived under Mandallaz' extended model approach.
1 2 3 4 5 6 7 8 9 10 11 12 |
formula.s0 |
an object of class " |
formula.s1 |
an object of class " |
data |
a data frame containing all variables contained in |
phase_id |
an object of class "
|
cluster |
(Optional) Specifies the column name in |
small_area |
(Optional) a list that if containing three elements:
Note: If |
boundary_weights |
(Optional) Specifies the column name in |
exhaustive |
(Optional) For global estimation, a vector of true auxiliary means corresponding to
an exhaustive first phase.
The vector must be input in the same order that |
progressbar |
(Optional) an object a type " |
psmall |
(Optional) an object a type " |
s1.id
identifies "second phase only" plots because the terrestrial phase is
known to be part of the second phase by the construction of the subsampling.
If estimations for multiple small-area domains should be computed, the domains have to be
defined within a character
vector using c()
. Using small_area(..., unbiased=FALSE)
calculates design-based estimates with the synthetic estimator and may be design-biased if
the model is biased in that small area. The default, small_area(..., unbiased=TRUE)
, allows for a residual
correction by one of two asymptotically equivalent methods to create design-unbiased estimates:
Mandallaz' extended model approach calculates the residual correction by extending the
model formula with an indicator variable in the small area. It is the default method
psmall
=FALSE.
the traditional small area estimator calculates the residual correction by taking the
synthetic estimator and adding the mean residual observed in the small area. It is activated
when psmall
=TRUE.
Missing values (NA
) in the auxiliary variables (i.e. at least one auxiliary variable cannot be observed at
an inventory location) are automatically removed from the dataset before the estimations are computed.
Note that missingness in the auxiliary variables is only allowed if we assume that they are missing at random,
since the unbiasedness of the estimates is based on the sampling design.
The boundary weight adjustment is pertinent for auxiliary information derived from remote sensing and is equal to the percentage of forested area (e.g. as defined by a forest mask) in the interpretation area.
Exhaustive estimation refers to when the true means of certain auxiliary variables are known
at an exhaustive zero phase (i.e. a census). For global estimation, the vector must be input
in the same order that lm
processes a formula
object including the intercept term whose
true mean will always be one. For small area estimation, exhaustive
is a data.frame
containing column names for every variable appearing in
the parameter formula
including the variable "Intercept". The observations of the data.frame
must represent the true auxiliary means in the same order as was presented in areas
from the
parameter small_area
. See 'Examples'.
threephase
returns an object of class "threephase"
.
An object of class "threephase"
returns a list
of the following components:
input |
a |
estimation |
a data frame containing the following components:
|
samplesizes |
a |
coefficients |
the coefficients of the two linear models:
|
cov_alpha_s2 |
the design-based covariance matrix of the reduced model coefficients |
cov_beta_s2 |
the design-based covariance matrix of the full model coefficients |
Z_bar_1_s0 |
the estimated auxiliary means of |
Z1_bar_s1 |
the estimated auxiliary means of |
Z_bar_s1 |
the estimated auxiliary means of |
cov_Z_bar_1_s0 |
the covariance matrix for |
resid_reduced |
the reduced model residuals at either the plot level or cluster level depending on the call |
resid_full |
the full model residuals at either the plot level or cluster level depending on the call |
warn.messages |
logical indicating if warning messages were issued |
In the special case of cluster sampling, the reported sample sizes in estimation
are the number of clusters.
The samplesize
-object also provides the respective number of single plot units for cluster sampling.
The reported r.squared_reduced
and r.squared_full
describe the model fit of the applied linear regression
models (i.e. on plot-level, not on cluster level).
Hill, A., Massey, A. F. (2021). The R Package forestinventory: Design-Based Global and Small Area Estimations for Multiphase Forest Inventories. Journal of Statistical Software, 97(4), 1-40.
Mandallaz, D., Breschan, J., & Hill, A. (2013). New regression estimators in forest inventories with two-phase sampling and partially exhaustive information: a design-based monte carlo approach with applications to small-area estimation. Canadian Journal of Forest Research, 43(11), 1023-1031.
Mandallaz, D. (2014). A three-phase sampling extension of the generalized regression estimator with partially exhaustive information. Can. J. For. Res. 44: 383-388
Massey, A. and Mandallaz, D. and Lanz, A. (2014). Integrating remote sensing and past inventory data under the new annual design of the Swiss National Forest Inventory using three-phase design-based regression estimation. Can. J. For. Res. 44(10): 1177-1186
Mandallaz, D. (2013). Regression estimators in forest inventories with three-phase sampling and two multivariate components of auxiliary information. ETH Zurich, Department of Environmental Systems Science,Tech. rep. Available from doi: 10.3929/ethz-a-009990020.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 | ## load datasets:
data(grisons)
data(zberg)
## define regression models for simple and cluster sampling:
formula.s0 <- tvol ~ mean # reduced model:
formula.s1 <- tvol ~ mean + stddev + max + q75 # full model
formula.clust.s0 <- basal ~ stade
formula.clust.s1 <- basal ~ stade + couver + melange
# ------------------------------------------------#
# ----------- GLOBAL ESTIMATION ------------------#
#----
## 1) -- Design-based estimation with non-exhaustive auxiliary information
#----
# 1.1) non-cluster-sampling (see eqns. [11], [14] and [16] in Mandallaz 2014):
summary(threephase(formula.s0, formula.s1, data = grisons,
phase_id = list(phase.col = "phase_id_3p", s1.id=1, terrgrid.id = 2)))
# 1.2) cluster-sampling (see eqns. [49] and [50] in Mandallaz 2013):
summary(threephase(formula.clust.s0, formula.clust.s1, data = zberg,
phase_id = list(phase.col="phase_id_3p", s1.id = 1, terrgrid.id = 2),
cluster = "cluster"))
# 1.3) example for boundary weight adjustment (non-cluster example):
summary(threephase(formula.s0, formula.s1, data = grisons,
phase_id = list(phase.col="phase_id_3p", s1.id = 1, terrgrid.id = 2),
boundary_weights = "boundary_weights"))
#----
## 2) -- Design-based estimation with exhaustive auxiliary information
#----
# 2.1) non-cluster-sampling (see eqns. [7], [9] and [10] in Mandallaz 2014):
summary(threephase(formula.s0, formula.s1, data = grisons,
phase_id = list(phase.col = "phase_id_3p", s1.id = 1, terrgrid.id = 2),
exhaustive = c(1,11.39)))
# 2.2) cluster-sampling:
summary(threephase(formula.clust.s0, formula.clust.s1, data = zberg,
phase_id = list(phase.col = "phase_id_3p", s1.id = 1, terrgrid.id = 2),
cluster = "cluster", exhaustive = c(1, 0.10, 0.7, 0.10)))
# ----------------------------------------------------#
# ----------- SMALL AREA ESTIMATION ------------------#
#----
## 1) -- Design-based estimation with non-exhaustive auxiliary information
#----
# 1.1) Mandallaz's extended pseudo small area estimator:
summary(threephase(formula.s0,
formula.s1,
data = grisons,
phase_id = list(phase.col = "phase_id_3p", s1.id = 1, terrgrid.id = 2),
small_area=list(sa.col = "smallarea", areas = c("A", "B", "C", "D"),
unbiased = TRUE)))
summary(threephase(formula.clust.s0,
formula.clust.s1,
data = zberg,
phase_id = list(phase.col = "phase_id_3p", s1.id = 1, terrgrid.id = 2),
cluster = "cluster",
small_area = list(sa.col = "ismallold", areas = c("1"), unbiased = TRUE)))
# 1.2) pseudo small area estimator:
summary(threephase(formula.s0,
formula.s1,
data = grisons,
phase_id = list(phase.col = "phase_id_3p", s1.id = 1, terrgrid.id = 2),
small_area = list(sa.col = "smallarea", areas = c("A", "B", "C", "D"),
unbiased = TRUE),
psmall = TRUE))
summary(threephase(formula.clust.s0,
formula.clust.s1,
data=zberg,
phase_id=list(phase.col="phase_id_3p", s1.id=1, terrgrid.id=2),
cluster="cluster",
small_area=list(sa.col="ismallold", areas=c("1"), unbiased=TRUE),
psmall = TRUE))
# 1.3) pseudosynthetic small area estimator:
summary(threephase(formula.s0 = tvol ~ mean,
formula.s1 = tvol ~ mean + stddev + max + q75,
data = grisons,
phase_id = list(phase.col = "phase_id_3p", s1.id = 1, terrgrid.id = 2),
small_area = list(sa.col = "smallarea", areas = c("A", "B", "C", "D"),
unbiased = FALSE)))
summary(threephase(formula.clust.s0,
formula.clust.s1,
data = zberg,
phase_id = list(phase.col = "phase_id_3p", s1.id = 1, terrgrid.id = 2),
cluster = "cluster",
small_area = list(sa.col = "ismallold", areas = c("1"), unbiased = FALSE)))
#----
## 2) -- Design-based estimation with exhaustive auxiliary information
#----
# true auxiliary mean for variable "mean" taken from Mandallaz et al. (2013):
truemeans.G <- data.frame(Intercept = rep(1, 4),
mean = c(12.85, 12.21, 9.33, 10.45))
rownames(truemeans.G) <- c("A", "B", "C", "D")
# true auxiliary means taken from Mandallaz (1991):
truemeans.G.clust <- data.frame(Intercept = 1, stade400 = 0.175, stade500 = 0.429,
stade600 = 0.321)
rownames(truemeans.G.clust) <- c("1")
# 2.1) Mandallaz's extended small area estimator:
summary(threephase(formula.s0,
formula.s1,
data = grisons,
phase_id = list(phase.col = "phase_id_3p", s1.id = 1, terrgrid.id = 2),
small_area = list(sa.col = "smallarea", areas = c("A", "B", "C", "D"),
unbiased = TRUE),
exhaustive = truemeans.G))
summary(threephase(formula.clust.s0,
formula.clust.s1,
data = zberg,
phase_id = list(phase.col = "phase_id_3p", s1.id = 1, terrgrid.id = 2),
cluster = "cluster",
small_area = list(sa.col = "ismallold", areas = c("1"), unbiased = TRUE),
exhaustive = truemeans.G.clust))
# 2.2) small area estimator:
summary(threephase(formula.s0,
formula.s1,
data = grisons,
phase_id = list(phase.col = "phase_id_3p", s1.id = 1, terrgrid.id = 2),
small_area = list(sa.col = "smallarea", areas = c("A", "B", "C", "D"),
unbiased = TRUE),
exhaustive = truemeans.G,
psmall = TRUE))
summary(threephase(formula.clust.s0,
formula.clust.s1,
data = zberg,
phase_id = list(phase.col = "phase_id_3p", s1.id = 1, terrgrid.id = 2),
cluster = "cluster",
small_area = list(sa.col = "ismallold", areas = c("1"), unbiased = TRUE),
exhaustive = truemeans.G.clust,
psmall = TRUE))
# 2.3) synthetic small area estimator:
summary(threephase(formula.s0,
formula.s1,
data = grisons,
phase_id = list(phase.col="phase_id_3p", s1.id = 1, terrgrid.id = 2),
small_area = list(sa.col = "smallarea", areas = c("A", "B", "C", "D"),
unbiased = FALSE),
exhaustive = truemeans.G))
summary(threephase(formula.clust.s0,
formula.clust.s1,
data = zberg,
phase_id = list(phase.col = "phase_id_3p", s1.id = 1, terrgrid.id = 2),
cluster = "cluster",
small_area = list(sa.col = "ismallold", areas = c("1"), unbiased = FALSE),
exhaustive = truemeans.G.clust))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.