Group-based multivariate trajectory modeling
gbmt is an R package implementing functionalities for group-based multivariate trajectory modeling, including maximum likelihood estimation through the Expectation-Maximization algorithm, imputation, prediction and graphical tools.
The reference papers are:
A. Magrini (2021). Assessment of agricultural sustainability in European Union countries: a group-based multivariate trajectory approach. To be appeared on Advances in Statistical Analysis.
D. S. Nagin, B. L. Jones, V. L. Passos and R. E. Tremblay (2018). Group-based multi-trajectory modeling. Statistical Methods in Medical Research, 27(7): 2015-2023. DOI: 10.1177/0962280216673085
R (The R Project for Statistical Computing) needs to be installed on your system in order
to use the gbmt package. R can be downloaded from https://www.r-project.org/.
To install the gbmt package, open the console of R and type:
install.packages("devtools") ## do not run if package 'devtools' is already installed
library(devtools)
install_github("alessandromagrini/gbmt")
The current version is 0.1.2 (updated: 24 november 2021). The package is still in beta test, thus ensure that your installation is up to date.
For any request or feedback, please write to alessandro.magrini@unifi.it (Alessandro Magrini)
Below, you find some examples of use of the package.
Load the gbmt package
library(gbmt)
Load data on indicators of agricultural sustainability in EU countries, 2004-2018:
data(agrisus)
?agrisus ## data description
summary(agrisus) ## data summaries
These data contain missing values. The same data after imputation (see Magrini, 2021) can be loaded as:
data(agrisus2)
?agrisus2 ## data description
summary(agrisus2) ## data summaries
Select the indicators (just a subset for illustration):
varNames <- c("TFP_2005", "NetCapital_GVA", "Income_rur", "Unempl_rur", "GHG_UAA", "GNB_N_UAA")
Fit a group-based multivariate trajectory model with 2 polynomial degrees and 3 groups using the imputed dataset:
# by default, the normalisation method is standardization (`scaling=2`)
# `scaling=3` applies the ratio to the sample mean (only for strictly positive measurements)
# `scaling=4` applies the log ratio to the sample mean (only for strictly positive measurements)
m3_2 <- gbmt(x.names=varNames, unit="Country", time="Year", d=2, ng=3, data=agrisus2, scaling=4)
See the resulting groups:
m3_2$assign.list
See the estimated group trajectories:
m3_2$fitted
Graphics of the estimated group trajectories:
plot(m3_2) ## overlapped groups
plot(m3_2, group=1) ## group 1
plot(m3_2, group=2) ## group 2
plot(m3_2, group=3) ## group 3
Prediction of group trajectories:
predict(m3_2, n.ahead=3) ## 3 steps ahead prediction
predict(m3_2, n.ahead=3, in.sample=TRUE) ## also include in-sample prediction
Prediction for unit 'Italy'
predict(m3_2, unit="Italy", n.ahead=3) ## 3 steps ahead prediction
predict(m3_2, unit="Italy", n.ahead=3, in.sample=TRUE) ## also include in-sample prediction
Graphics of estimated group trajectories including 3 steps ahead prediction:
mar0 <- c(3.1,2.55,3.1,1.2)
plot(m3_2, n.ahead=3, mar=mar0) ## overlapped groups
plot(m3_2, group=1, n.ahead=3, mar=mar0) ## group 1
plot(m3_2, group=2, n.ahead=3, mar=mar0) ## group 2
plot(m3_2, group=3, n.ahead=3, mar=mar0) ## group 3
Graphics of estimated trajectories for unit 'Italy' including 3 steps ahead prediction:
plot(m3_2, unit="Italy", n.ahead=3, mar=mar0)
Fit the same model with 10 random restarts:
set.seed(123) ## seed for reproducibility
m3_2r <- gbmt(x.names=varNames, unit="Country", time="Year", d=2, ng=3, data=agrisus2, nstart=10, scaling=4)
Fit models with up to 6 groups and select the best one based on information criteria:
ng <- 1:6
modList <- vector("list", length=length(ng))
names(modList) <- ng
for(i in 1:length(ng)) {
set.seed(123) ## seed for reproducibility
modList[[i]] <- gbmt(x.names=varNames, unit="Country", time="Year", d=2, ng=ng[i], data=agrisus2, nstart=10, scaling=4)
}
do.call(rbind, lapply(modList, function(x){x$ic}))
Group selection using all indicators and the incomplete dataset. [Note: this reproduces the results of Magrini (2021) but takes very much time. Reduce the number of random restarts to speed up the code]
varNames <- c("TFP_2005","NetCapital_GVA","Manager_ratio",
"FactorIncome_paid_2010","EntrIncome_unpaid_2010",
"Income_rur","Unempl_rur","Poverty_rur",
"RenewProd","Organic_p","GHG_UAA","GNB_N_UAA")
ng <- 1:6
modListFull <- vector("list", length=length(ng))
names(modListFull) <- ng
for(i in 1:length(ng)) {
set.seed(123) ## seed for reproducibility
modListFull[[i]] <- gbmt(x.names=varNames, unit="Country", time="Year", d=3, ng=ng[i],
data=agrisus, scaling=4, nstart=100, maxit=5000)
}
do.call(rbind, lapply(modListFull, function(x){x$ic}))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.