trim: Estimate TRIM model parameters.

Description Usage Arguments Details Models Using yearly and monthly counts Using covariates Estimation options Demands on data See Also Examples

View source: R/trim.R

Description

Given some count observations, estimate a TRIM model and use these to impute the data set if nescessary.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
trim(object, ...)

## S3 method for class 'data.frame'
trim(
  object,
  count_col = "count",
  site_col = "site",
  year_col = "year",
  month_col = NULL,
  weights_col = NULL,
  covar_cols = NULL,
  model = 2,
  changepoints = ifelse(model == 2, 1L, integer(0)),
  overdisp = FALSE,
  serialcor = FALSE,
  autodelete = TRUE,
  stepwise = FALSE,
  covin = list(),
  ...
)

## S3 method for class 'formula'
trim(object, data = NULL, weights = NULL, ...)

## S3 method for class 'trimcommand'
trim(object, ...)

Arguments

object

Either a data.frame, a formula or a trimcommand. If object is a formula, the dependent variable (left-hand-side) is treated as the 'counts' variable. The first and second independent variable are treated as the 'site' and 'time' variable, in that specific order. All other variables are treated as covariates.

...

More parameters, see below in the details

count_col

[character] name of the column holding species counts

site_col

[character] name of the column holding the site id

year_col

[character] name of the column holding the time of counting

month_col

[character] optional name of the column holding the season of counting

weights_col

[numeric] Optional vector of site weights. The length of

covar_cols

[character] name(s) of column(s) holding covariates

model

[numeric] TRIM model type 1, 2, or 3.

changepoints

[numeric] Indices for changepoints (‘Models’).

overdisp

[logical] Take overdispersion into account (See ‘Estimation options’).

serialcor

[logical] Take serial correlation into account (See ‘Estimation details’)

autodelete

[logical] Auto-delete changepoints when number of observations is too small. (See ‘Demands on data’).

stepwise

[logical] Perform stepwise refinement of changepoints.

covin

a list of variance-covariance matrices; one per pseudo-site.

data

[data.frame] Data frame containing at least counts, sites, and times

weights

[character] name of the column in data which respresents weights (optional)

Details

All versions of trim support additional 'experts only' arguments:

verbose

Logical switch to temporarily enable verbose output. (use option(trim_verbose=TRUE)) for permanent verbosity.

constrain_overdisp

Numerical value to control overdispersion.

  • A value in the range 0..1 uses a Chi-squared oulier detection method.

  • A value >1 uses Tukey's Fence.

  • A value of 1.0 (which is the default) results in unconstrained overdispersion.

See vigenette ‘Taming overdispersion’ for more information.

conv_crit

Convergence criterion. Used within the iterative emodel estimation algorithm. The default value is 1e-5.). May be set to higher values in case models don't converge.

max_iter

Number of iterations. Default value is 200. May be set to higher values in case models don't converge.

premove

Probability of removal of changepoints (default value: 0.2). Parameter used in stepwise refinement of models. See the vignette 'Models and statistical methods in rtrim'.

penter

Probability of re-entering of changepoints (default value: 0.15). Similar use as premove.

Models

The purpose of trim is to estimate population totals over time, based on a set of counts f_{ij} at sites i=1,2,…,I and times j=1,2,…,J. If no count data is available at site and time (i,j), a value μ_{ij} will be imputed.

In Model 1, the imputed values are modeled as

\lnμ_{ij} = α_i,

where α_i is the site effect. This model implies that the counts vary accross sites, not over time. The model-based time totals are equal to each time point and the model-based indices are all equal to one.

In Model 2, the imputed values are modeled as

\lnμ_{ij} = α_i + β\times(j-1).

Here, α_i is the log-count of site i averaged over time and β is the mean growth factor that is shared by all sites over all of time. The assumption of a constant growth rate may be relaxed by passing a number of changepoints that indicate at what times the growth rate is allowed to change. Using a wald test one can investigate whether the changes in slope at the changepoints are significant. Setting stepwise=TRUE makes trim automatically remove changepoints where the slope does not change significantly.

In Model 3, the imputed values are modeled as

\lnμ_{ij}=α_i + β_j,

where β_j is the deviatiation of log-counts at time j, averaged over all sites. To make this model identifiable, the value of β_1=0 by definition. Model 3 can be shown to be equivalent to Model 2 with a changepoint at every time point. Using a wald test, one can estimate whether the collection of deviations β_i make the model differ significantly from an overall linear trend (Model 2 without changepoints).

The parameters α_i and γ_j are referred to as the additive representation of the coefficients. Once computed, they can be represented and extracted in several representations, using the coefficients function. (See also the examples below).

Other model parameters can be extracted using functions such as gof (for goodness of fit), summary or totals. Refer to the ‘See also’ section for an overview.

Using yearly and monthly counts

In many data sets will use use only yearly count data, in which case the time j will reflect the year number. An extension of trim is to use monthly (or any other sub-yearly) count data, in combination with index computations on the yearly time scale.

In this case, counts are given as f_{i,j,m} with m=1,2,…,M the month number. As before, μ_{i,j,m} will be imputed in case of missing counts.

The contibution of month factors to the model is always similar to the way year factors are used in Model 3, that is,

\lnμ_{i,j,m} = α_i + β\times(j-1) + γ_m for Model 2, and \lnμ_{i,j,m} = α_i + β_j + γ_m for Model 3.

For the same reason why β_1=0 for Model 3, γ_1=0 in case of monthly parameters.

Using covariates

In the basic case of Models 2 and 3, the growth parameter β does not vary accross sites. If auxiliary information is available (for instance a classification of the type of soil or vegetation), the effect of these variables on the per-site growth rate can be taken into account.

For Model 2 with covariates the growth factor β is replaced with a factor

β_0 + ∑_{k=1}^K z_{ijk}β_k.

Here, β_0 is referred to as the baseline and z_{ijk} is a dummy variable that combines dummy variables for all covariates. Since a covariate with L classes is modeled by L-1 dummy variables, the value of K is equal to the sum of the numbers of categories for all covariates minus the number of covariates. Observe that this model allows for a covariate to change over time at a certain sites. It is therefore possible to include situations for example where a site turns from farmland to rural area. The coefficients function will report every individual value of β. With a wald test, the significance of contributions of covariates can be tested.

For Model 3 with covariates the parameter β_j is replaced by

β_{j0} + ∑_{k=1}^Kz_{ijk}β_{jk}.

Again, the β_{j0} are referred to as baseline parameters and the β_{jk} record mean differences in log-counts within a set of sites with equal values for the covariates. All coefficients can be extracted with coefficients and the significance of covariates can be investigated with the wald test.

Estimation options

In the simplest case, the counts at different times and sites are considered independently Poisson distributed. The (often too strict) assumption that counts are independent over time may be dropped, so correlation between time points at a certain site can be taken into account. The assumption of being Poisson distributed can be relaxed as well. In general, the variance-covariance structure of counts f_{ij} at site i for time j is modeled as

where σ is called the overdispersion, μ_{ij} is the estimated count for site i, time j and ρ is called the serial correlation.

If σ=1, a pure Poisson distribution is assumed to model the counts. Setting overdispersion = TRUE makes trim relax this condition. Setting serialcor=TRUE allows trim to assume a non-zero correlation between adjacent time points, thus relaxing the assumption of independence over time.

Demands on data

The data set must contain sufficient counts to be able to estimate the model. In particular

The function check_observations identifies cases where too few observations are present to compute a model. Setting the option autodelete=TRUE (Model 2 only) makes trim remove changepoints such that at each time piece sufficient counts are available to estimate the model.

See Also

rtrim by example for a gentle introduction, rtrim for TRIM users for users of the classic Delphi-based TRIM implementation, and rtrim 2 extensions for the major changes from rtrim v.1 to rtrim v.2

Other analyses: coef.trim(), confint.trim(), gof(), index(), now_what(), overall(), overdispersion(), plot.trim.index(), plot.trim.overall(), results(), serial_correlation(), summary.trim(), totals(), vcov.trim(), wald()

Other modelspec: check_observations(), read_tcf(), read_tdf(), set_trim_verbose(), trimcommand()

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
data(skylark)
m <- trim(count ~ site + time, data=skylark, model=2)
summary(m)
coefficients(m)

# An example using weights
# set up some random weights (one for each site)
w <- runif(55, 0.1, 0.9)
# match weights to sites
skylark$weights <- w[skylark$site]
# run model
m <- trim(count ~ site + time, data=skylark, weights="weights", model=3)

# An example using change points, a covariate, and overdispersion
# 1 is added as cp automatically
cp <- c(2,6)
m <- trim(count ~ site + time + Habitat, data=skylark, model=2, changepoints=cp, overdisp=TRUE)
coefficients(m)
# check significance of changes in slope
wald(m)
plot(overall(m))

Example output

Welcome to rtrim 2.0.6 Type ?`rtrim-package` to get started.

Attaching package: 'rtrim'

The following object is masked from 'package:stats':

    heatmap

Call:
trim(count ~ site + time, data = skylark, model = 2)

Model  : 2
Method : ML (Convergence reached after 3 iterations)

Coefficients:
  from upto        add     se_add      mul     se_mul
1    1    8 0.05482546 0.01043636 1.056356 0.01102452


Goodness of fit:
              Chi-square = 210.53, df=146, p=0.0004
        Likelihood Ratio = 204.63, df=146, p=0.0010
  AIC (up to a constant) = -87.37
  from upto        add     se_add      mul     se_mul
1    1    8 0.05482546 0.01043636 1.056356 0.01102452
     covar cat from upto         add     se_add       mul     se_mul
1 baseline   0    2    6 -0.12198719 0.05032174 0.8851597 0.04454278
2 baseline   0    6    8 -0.02634589 0.11521258 0.9739981 0.11221684
3  Habitat   2    2    6  0.19776372 0.05416683 1.2186744 0.06601173
4  Habitat   2    6    8  0.13706944 0.12202082 1.1469078 0.13994663
Wald test for significance of covariates
 Covariate        W df            p
   Habitat 24.51087  2 4.759189e-06

Wald test for significance of changes in slope
 Changepoint  Wald_test df            p
           2 20.1685032  2 4.173161e-05
           6  0.8530229  2 6.527824e-01

rtrim documentation built on April 21, 2020, 5:06 p.m.