README.md

gbmt

Group-based multivariate trajectory modeling

gbmt is an R package implementing functionalities for group-based multivariate trajectory modeling, including maximum likelihood estimation through the Expectation-Maximization algorithm, imputation, prediction and graphical tools. The reference papers are:

A. Magrini (2021). Assessment of agricultural sustainability in European Union countries: a group-based multivariate trajectory approach. To be appeared on Advances in Statistical Analysis.

D. S. Nagin, B. L. Jones, V. L. Passos and R. E. Tremblay (2018). Group-based multi-trajectory modeling. Statistical Methods in Medical Research, 27(7): 2015-2023. DOI: 10.1177/0962280216673085

R (The R Project for Statistical Computing) needs to be installed on your system in order to use the gbmt package. R can be downloaded from https://www.r-project.org/.

To install the gbmt package, open the console of R and type:

install.packages("devtools")  ## do not run if package 'devtools' is already installed
library(devtools)
install_github("alessandromagrini/gbmt")

The current version is 0.1.2 (updated: 24 november 2021). The package is still in beta test, thus ensure that your installation is up to date.

For any request or feedback, please write to alessandro.magrini@unifi.it (Alessandro Magrini)

Below, you find some examples of use of the package.

Load the gbmt package

library(gbmt)

Load data on indicators of agricultural sustainability in EU countries, 2004-2018:

data(agrisus)
?agrisus          ## data description
summary(agrisus)  ## data summaries

These data contain missing values. The same data after imputation (see Magrini, 2021) can be loaded as:

data(agrisus2)
?agrisus2          ## data description
summary(agrisus2)  ## data summaries

Select the indicators (just a subset for illustration):

varNames <- c("TFP_2005", "NetCapital_GVA", "Income_rur", "Unempl_rur", "GHG_UAA", "GNB_N_UAA")

Fit a group-based multivariate trajectory model with 2 polynomial degrees and 3 groups using the imputed dataset:

# by default, the normalisation method is standardization (`scaling=2`)
#   `scaling=3` applies the ratio to the sample mean (only for strictly positive measurements)
#   `scaling=4` applies the log ratio to the sample mean (only for strictly positive measurements)
m3_2 <- gbmt(x.names=varNames, unit="Country", time="Year", d=2, ng=3, data=agrisus2, scaling=4)

See the resulting groups:

m3_2$assign.list

See the estimated group trajectories:

m3_2$fitted

Graphics of the estimated group trajectories:

plot(m3_2)           ## overlapped groups
plot(m3_2, group=1)  ## group 1
plot(m3_2, group=2)  ## group 2
plot(m3_2, group=3)  ## group 3

Prediction of group trajectories:

predict(m3_2, n.ahead=3)                  ## 3 steps ahead prediction
predict(m3_2, n.ahead=3, in.sample=TRUE)  ## also include in-sample prediction

Prediction for unit 'Italy'

predict(m3_2, unit="Italy", n.ahead=3)                  ## 3 steps ahead prediction
predict(m3_2, unit="Italy", n.ahead=3, in.sample=TRUE)  ## also include in-sample prediction

Graphics of estimated group trajectories including 3 steps ahead prediction:

mar0 <- c(3.1,2.55,3.1,1.2)
plot(m3_2, n.ahead=3, mar=mar0)           ## overlapped groups
plot(m3_2, group=1, n.ahead=3, mar=mar0)  ## group 1
plot(m3_2, group=2, n.ahead=3, mar=mar0)  ## group 2
plot(m3_2, group=3, n.ahead=3, mar=mar0)  ## group 3

Graphics of estimated trajectories for unit 'Italy' including 3 steps ahead prediction:

plot(m3_2, unit="Italy", n.ahead=3, mar=mar0)

Fit the same model with 10 random restarts:

set.seed(123)  ## seed for reproducibility
m3_2r <- gbmt(x.names=varNames, unit="Country", time="Year", d=2, ng=3, data=agrisus2, nstart=10, scaling=4)

Fit models with up to 6 groups and select the best one based on information criteria:

ng <- 1:6
modList <- vector("list", length=length(ng))
names(modList) <- ng
for(i in 1:length(ng)) {
  set.seed(123)  ## seed for reproducibility
  modList[[i]] <- gbmt(x.names=varNames, unit="Country", time="Year", d=2, ng=ng[i], data=agrisus2, nstart=10, scaling=4)
  }
do.call(rbind, lapply(modList, function(x){x$ic}))

Group selection using all indicators and the incomplete dataset. [Note: this reproduces the results of Magrini (2021) but takes very much time. Reduce the number of random restarts to speed up the code]

varNames <- c("TFP_2005","NetCapital_GVA","Manager_ratio",
  "FactorIncome_paid_2010","EntrIncome_unpaid_2010",
  "Income_rur","Unempl_rur","Poverty_rur",
  "RenewProd","Organic_p","GHG_UAA","GNB_N_UAA")
ng <- 1:6
modListFull <- vector("list", length=length(ng))
names(modListFull) <- ng
for(i in 1:length(ng)) {
  set.seed(123)  ## seed for reproducibility
  modListFull[[i]] <- gbmt(x.names=varNames, unit="Country", time="Year", d=3, ng=ng[i],
                           data=agrisus, scaling=4, nstart=100, maxit=5000)
  }
do.call(rbind, lapply(modListFull, function(x){x$ic}))


alessandromagrini/gbmt documentation built on Feb. 6, 2022, 10:55 p.m.