threephase: threephase

Description Usage Arguments Details Value Note References Examples

View source: R/threephase.R

Description

threephase is used to calculate estimations based on triple sampling under the model-assisted Monte Carlo approach. A zero phase of auxiliary information (e.g. taken from remote sensing data) is used to generate model predictions based on multiple linear regression using the method of ordinary least squares. A subsample of the zero phase comprises further auxiliary information that produces another set of model predictions. A further subsample produces a second phase based on terrestrial observations (i.e. the local densities of the ground truth) and is used to correct for bias in the design-based sense. The estimation method is available for simple and cluster sampling and includes the special case where the first phase is based on an exhaustive sample (i.e. a census). Small-area applications are supported for synthetic estimation as well as two varieties of bias-corrected estimators: the traditional small-area estimator and an asymptotically equivalent version derived under Mandallaz' extended model approach.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
threephase(
  formula.s0,
  formula.s1,
  data,
  phase_id,
  cluster = NA,
  small_area = list(sa.col = NA, areas = NA, unbiased = TRUE),
  boundary_weights = NA,
  exhaustive = NA,
  progressbar = FALSE,
  psmall = FALSE
)

Arguments

formula.s0

an object of class "formula" as would be used in the function lm that contains a reduced set of auxiliary variables available for all zero phase plots

formula.s1

an object of class "formula" as would be used in the function lm that contains the predictors from formula.s0 as well as further ancilliary predictors available for all first phase plots (i.e. formula.s0 is nested in formula.s1)

data

a data frame containing all variables contained in formula and a column indexing phase membership. Additional columns designating small-area membership, cluster ID and boundary weights should also be contained in the data frame if they are requested in the function.

phase_id

an object of class "list" containing three elements:

  • phase.col: the column name in data that specifies the phase membership of each observation

  • s1.id: the indicator identifying the "second phase only" plots for that column (must be of type "numeric")

  • terrgrid.id: the indicator identifying the terrestrial (a.k.a. "ground truth") phase for that column (must be of type "numeric")

cluster

(Optional) Specifies the column name in data containing the cluster ID. Only used in case of cluster sampling.

small_area

(Optional) a list that if containing three elements:

  • sa.col: the column name in data containing domain identification

  • areas: vector of desired small-area domain identifiers

  • unbiased: an object of type "logical" that when FALSE designates that the estimator is allowed to be biased (i.e. the synthetic estimator) and when TRUE forces it to be design-unbiased. See 'Details'.

Note: If small_area is left unchanged then twophase defaults to global estimation.

boundary_weights

(Optional) Specifies the column name in data containing the weights for boundary adjustment. See 'Details'

exhaustive

(Optional) For global estimation, a vector of true auxiliary means corresponding to an exhaustive first phase. The vector must be input in the same order that lm processes a formula object and include the intercept term. For small area estimation, exhaustive is a data.frame containing column names (colnames) for every variable appearing in the parameter formula including the variable "Intercept".Rownames (row.names) have to be used and must correspond to the names of the small areas. See 'Details'.

progressbar

(Optional) an object a type "logical" that when TRUE prints the progress of the calculation in the console (recommended for large amount of small areas). Defaults to FALSE.

psmall

(Optional) an object a type "logical" used for small area estimations that only works when unbiased in the parameter small_area is set to TRUE. See 'Details'.

Details

s1.id identifies "second phase only" plots because the terrestrial phase is known to be part of the second phase by the construction of the subsampling.

If estimations for multiple small-area domains should be computed, the domains have to be defined within a character vector using c(). Using small_area(..., unbiased=FALSE) calculates design-based estimates with the synthetic estimator and may be design-biased if the model is biased in that small area. The default, small_area(..., unbiased=TRUE), allows for a residual correction by one of two asymptotically equivalent methods to create design-unbiased estimates:

Missing values (NA) in the auxiliary variables (i.e. at least one auxiliary variable cannot be observed at an inventory location) are automatically removed from the dataset before the estimations are computed. Note that missingness in the auxiliary variables is only allowed if we assume that they are missing at random, since the unbiasedness of the estimates is based on the sampling design.

The boundary weight adjustment is pertinent for auxiliary information derived from remote sensing and is equal to the percentage of forested area (e.g. as defined by a forest mask) in the interpretation area.

Exhaustive estimation refers to when the true means of certain auxiliary variables are known at an exhaustive zero phase (i.e. a census). For global estimation, the vector must be input in the same order that lm processes a formula object including the intercept term whose true mean will always be one. For small area estimation, exhaustive is a data.frame containing column names for every variable appearing in the parameter formula including the variable "Intercept". The observations of the data.frame must represent the true auxiliary means in the same order as was presented in areas from the parameter small_area. See 'Examples'.

Value

threephase returns an object of class "threephase".

An object of class "threephase" returns a list of the following components:

input

a list containing the function's inputs

estimation

a data frame containing the following components:

  • area: the domain (only present if argument areas has been used)

  • estimate: the point estimate

  • ext_variance: the external variance of the point estimate that doesn't account for fitting the model from the current inventory

  • g_variance: the internal (g-weight) variance that accounts for fitting the model from the current inventory

  • n0 the zero phase sample size of plots

  • n1 the first phase sample size of plots

  • n2 the second phase (i.e. terrestrial) sample size of plots

  • n0G the zero phase sample size in the small area

  • n1G the first phase sample size in the small area

  • n2G the second phase (i.e. terrestrial) sample size in the small area

  • r.squared_reduced the R-squared of the linear model based on formula.s0 (i.e. the reduced model)

  • r.squared_full the R-squared of the linear model based on formula.s1 (i.e. the full model)

samplesizes

a data.frame summarizing all samplesizes: in case of cluster sampling both, the number of individual plots and the number of clusters is reported.

coefficients

the coefficients of the two linear models:

  • alpha: the reduced model coefficients

  • beta: the full model coefficients

cov_alpha_s2

the design-based covariance matrix of the reduced model coefficients

cov_beta_s2

the design-based covariance matrix of the full model coefficients

Z_bar_1_s0

the estimated auxiliary means of formula.s0 based on the zero phase. If the zero phase is exhaustive, these are the true auxiliary means specified in the input-argument exhaustive.

Z1_bar_s1

the estimated auxiliary means of formula.s0 based on the first phase

Z_bar_s1

the estimated auxiliary means of formula.s1 based on the first phase

cov_Z_bar_1_s0

the covariance matrix for Z_bar_1_s0

resid_reduced

the reduced model residuals at either the plot level or cluster level depending on the call

resid_full

the full model residuals at either the plot level or cluster level depending on the call

warn.messages

logical indicating if warning messages were issued

Note

In the special case of cluster sampling, the reported sample sizes in estimation are the number of clusters. The samplesize-object also provides the respective number of single plot units for cluster sampling. The reported r.squared_reduced and r.squared_full describe the model fit of the applied linear regression models (i.e. on plot-level, not on cluster level).

References

Hill, A., Massey, A. F. (2021). The R Package forestinventory: Design-Based Global and Small Area Estimations for Multiphase Forest Inventories. Journal of Statistical Software, 97(4), 1-40.

Mandallaz, D., Breschan, J., & Hill, A. (2013). New regression estimators in forest inventories with two-phase sampling and partially exhaustive information: a design-based monte carlo approach with applications to small-area estimation. Canadian Journal of Forest Research, 43(11), 1023-1031.

Mandallaz, D. (2014). A three-phase sampling extension of the generalized regression estimator with partially exhaustive information. Can. J. For. Res. 44: 383-388

Massey, A. and Mandallaz, D. and Lanz, A. (2014). Integrating remote sensing and past inventory data under the new annual design of the Swiss National Forest Inventory using three-phase design-based regression estimation. Can. J. For. Res. 44(10): 1177-1186

Mandallaz, D. (2013). Regression estimators in forest inventories with three-phase sampling and two multivariate components of auxiliary information. ETH Zurich, Department of Environmental Systems Science,Tech. rep. Available from doi: 10.3929/ethz-a-009990020.

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
## load datasets:
data(grisons)
data(zberg)


## define regression models for simple and cluster sampling:
formula.s0 <- tvol ~ mean # reduced model:
formula.s1 <- tvol ~ mean + stddev + max + q75 # full model
formula.clust.s0 <- basal ~ stade
formula.clust.s1 <- basal ~ stade + couver + melange


# ------------------------------------------------#
# ----------- GLOBAL ESTIMATION ------------------#

#----
## 1) -- Design-based estimation with non-exhaustive auxiliary information
#----

# 1.1) non-cluster-sampling (see eqns. [11], [14] and [16] in Mandallaz 2014):
summary(threephase(formula.s0, formula.s1, data = grisons,
                   phase_id = list(phase.col = "phase_id_3p", s1.id=1, terrgrid.id = 2)))


# 1.2) cluster-sampling (see eqns. [49] and [50] in Mandallaz 2013):
summary(threephase(formula.clust.s0, formula.clust.s1, data = zberg,
                   phase_id = list(phase.col="phase_id_3p", s1.id = 1, terrgrid.id = 2),
                   cluster = "cluster"))


# 1.3) example for boundary weight adjustment (non-cluster example):
summary(threephase(formula.s0, formula.s1, data = grisons,
                   phase_id = list(phase.col="phase_id_3p", s1.id = 1, terrgrid.id = 2),
                   boundary_weights = "boundary_weights"))

#----
## 2) -- Design-based estimation with exhaustive auxiliary information
#----

# 2.1) non-cluster-sampling (see eqns. [7], [9] and [10] in Mandallaz 2014):
summary(threephase(formula.s0, formula.s1, data = grisons,
                   phase_id = list(phase.col = "phase_id_3p", s1.id = 1, terrgrid.id = 2),
                   exhaustive = c(1,11.39)))


# 2.2) cluster-sampling:
summary(threephase(formula.clust.s0, formula.clust.s1, data = zberg,
                   phase_id = list(phase.col = "phase_id_3p", s1.id = 1, terrgrid.id = 2),
                   cluster = "cluster", exhaustive = c(1, 0.10, 0.7, 0.10)))


# ----------------------------------------------------#
# ----------- SMALL AREA ESTIMATION ------------------#

#----
## 1) --  Design-based estimation with non-exhaustive auxiliary information
#----

# 1.1) Mandallaz's extended pseudo small area estimator:
summary(threephase(formula.s0,
                   formula.s1,
                   data = grisons,
                   phase_id = list(phase.col = "phase_id_3p", s1.id = 1, terrgrid.id = 2),
                   small_area=list(sa.col = "smallarea", areas = c("A", "B", "C", "D"),
                                   unbiased = TRUE)))

summary(threephase(formula.clust.s0,
                   formula.clust.s1,
                   data = zberg,
                   phase_id = list(phase.col = "phase_id_3p", s1.id = 1, terrgrid.id = 2),
                   cluster = "cluster",
                   small_area = list(sa.col = "ismallold", areas = c("1"), unbiased = TRUE)))


# 1.2) pseudo small area estimator:
summary(threephase(formula.s0,
                   formula.s1,
                   data = grisons,
                   phase_id = list(phase.col = "phase_id_3p", s1.id = 1, terrgrid.id = 2),
                   small_area = list(sa.col = "smallarea", areas = c("A", "B", "C", "D"),
                                   unbiased = TRUE),
                   psmall = TRUE))

summary(threephase(formula.clust.s0,
                   formula.clust.s1,
                   data=zberg,
                   phase_id=list(phase.col="phase_id_3p", s1.id=1, terrgrid.id=2),
                   cluster="cluster",
                   small_area=list(sa.col="ismallold", areas=c("1"), unbiased=TRUE),
                   psmall = TRUE))


# 1.3) pseudosynthetic small area estimator:
summary(threephase(formula.s0  = tvol ~ mean,
                   formula.s1 = tvol ~ mean + stddev + max + q75,
                   data = grisons,
                   phase_id = list(phase.col = "phase_id_3p", s1.id = 1, terrgrid.id = 2),
                   small_area = list(sa.col = "smallarea", areas = c("A", "B", "C", "D"),
                                   unbiased = FALSE)))

summary(threephase(formula.clust.s0,
                   formula.clust.s1,
                   data = zberg,
                   phase_id = list(phase.col = "phase_id_3p", s1.id = 1, terrgrid.id = 2),
                   cluster = "cluster",
                   small_area = list(sa.col = "ismallold", areas = c("1"), unbiased = FALSE)))


#----
## 2) --  Design-based estimation with exhaustive auxiliary information
#----

# true auxiliary mean for variable "mean" taken from Mandallaz et al. (2013):
truemeans.G <- data.frame(Intercept = rep(1, 4),
                         mean = c(12.85, 12.21, 9.33, 10.45))
rownames(truemeans.G) <- c("A", "B", "C", "D")

# true auxiliary means taken from Mandallaz (1991):
truemeans.G.clust <- data.frame(Intercept = 1, stade400 = 0.175, stade500 = 0.429,
                               stade600 = 0.321)
rownames(truemeans.G.clust) <- c("1")


# 2.1) Mandallaz's extended small area estimator:
summary(threephase(formula.s0,
                   formula.s1,
                   data = grisons,
                   phase_id = list(phase.col = "phase_id_3p", s1.id = 1, terrgrid.id = 2),
                   small_area = list(sa.col = "smallarea", areas = c("A", "B", "C", "D"),
                                unbiased = TRUE),
                   exhaustive = truemeans.G))

summary(threephase(formula.clust.s0,
                   formula.clust.s1,
                   data = zberg,
                   phase_id = list(phase.col = "phase_id_3p", s1.id = 1, terrgrid.id = 2),
                   cluster = "cluster",
                   small_area = list(sa.col = "ismallold", areas = c("1"), unbiased = TRUE),
                   exhaustive = truemeans.G.clust))


# 2.2) small area estimator:
summary(threephase(formula.s0,
                   formula.s1,
                   data = grisons,
                   phase_id = list(phase.col = "phase_id_3p", s1.id = 1, terrgrid.id = 2),
                   small_area = list(sa.col = "smallarea", areas = c("A", "B", "C", "D"),
                                   unbiased = TRUE),
                   exhaustive = truemeans.G,
                   psmall = TRUE))

summary(threephase(formula.clust.s0,
                   formula.clust.s1,
                   data = zberg,
                   phase_id = list(phase.col = "phase_id_3p", s1.id = 1, terrgrid.id = 2),
                   cluster = "cluster",
                   small_area = list(sa.col = "ismallold", areas = c("1"), unbiased = TRUE),
                   exhaustive = truemeans.G.clust,
                   psmall = TRUE))


# 2.3) synthetic small area estimator:
summary(threephase(formula.s0,
                   formula.s1,
                   data = grisons,
                   phase_id = list(phase.col="phase_id_3p", s1.id = 1, terrgrid.id = 2),
                   small_area = list(sa.col = "smallarea", areas = c("A", "B", "C", "D"),
                                   unbiased = FALSE),
                   exhaustive = truemeans.G))

summary(threephase(formula.clust.s0,
                   formula.clust.s1,
                   data = zberg,
                   phase_id = list(phase.col = "phase_id_3p", s1.id = 1, terrgrid.id = 2),
                   cluster = "cluster",
                   small_area = list(sa.col = "ismallold", areas = c("1"), unbiased = FALSE),
                   exhaustive = truemeans.G.clust))

AndreasChristianHill/forestinventory documentation built on Aug. 16, 2021, 2:13 p.m.