Description Usage Arguments Details Value References Examples
View source: R/data_analysis.R
fit_regression
fits either a Generalized Linear Model or a betareg model
to the data and search for the best model(s) given a list of predictors using
the R package glmulti.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | fit_regression(
data,
lookup = NULL,
census = TRUE,
yr,
project,
scenario,
popn,
param = "N",
vs = c("GS1"),
count_data = c("Nextant", "Nall", "Nalleles", "N", "AM", "AF", "Subadults", "Juv",
"nDams", "nBroods", "nProgeny", "nImmigrants", "nEmigrants", "nHarvested",
"nSupplemented", "YrExt", "Alleles"),
ic = "aic",
l = 1,
ncand = 30,
set_size = NA,
links = c("logit", "probit", "cloglog", "cauchit", "loglog"),
save2disk = TRUE,
dir_out = "DataAnalysis/FitRegression"
)
|
data |
The long format of census (from |
lookup |
(Optional) A look-up table where the scenario names are listed together with the (missing) variables needed to fit the regression models |
census |
Whether the input is census data |
yr |
The year that has to be used in the analysis if census=TRUE |
project |
The Vortex project name |
scenario |
Vortex scenario name |
popn |
The sequential number of the population (in integer) |
param |
The dependent variable |
vs |
Character vector with independent variable(s) |
count_data |
Character vector with param(s) that are counts and would use a Poisson error distribution |
ic |
Information criterion |
l |
Level for glmulti search: 1 main effects, 2 main effects + interactions |
ncand |
The threshold of candidate models after which switch to the genetic search method, default: 30 |
set_size |
Value to be used in confsetsize (from |
links |
Link functions to use in the Beta regression. |
save2disk |
Whether to save the output to disk, default: TRUE |
dir_out |
The local path to store the output. Default: DataAnalysis/FitRegression |
fit_regression
fits a different type of regression model depending on the
dependent variable. When this is a count (e.g. N or the number of alleles),
the function will fit a Generalized Linear Model. The first fit is attempted
with a Poisson error distribution and if the dispersion parameter (calculated
as residual deviance / df is larger than (the somewhat
arbitrary cut off of) 1.5, the model will be refitted with a quasipoisson
error distribution (a message is displayed if this happens).
fit_regression
establishes whether the dependent variable is a count
by searching for it in count_data
. If the users generated their own
dependent variable (e.g. through a PS), this has to be included in count_data
to indicate fit_regression
that it is analysing count data.
When the number of alleles is the dependent variable (from .run files), this is rounded to integer (to meet R requirement that count data are integers) before a GLM is fitted to the data.
If param
is a proportion (e.g. Gene Diversity and Inbreeding), then
the function uses a Beta regression from the R package betareg (Cribari-Neto
& Zeileis 2010). Different link functions are tested and the one with the
lowest AIC value is selected. The selected link function is displayed on the
R console and the difference in the AIC scores relative to the best link
function is also displayed.
In the initial fit of the model the main and interactions effects are included.
Successively, a search for the best model is carried out. This is performed
with the R package glmulti
(Calcagno & de Mazancourt 2010).
fit_regression
will conduct an exhaustive search if ncand is less or
equal to the number of candidate models, otherwise it will use a genetic
search method (see glmulti documentations for more details about the search
methods). When glmulti
uses the genetic search method,
two small files (with extension .modgen.back
and .mods.back.
are
written in the working directory even if save2disk=FALSE
.
fit_regression
explicitly ignores NA.
Depending on the data, fitting several Beta regression models to complete the search may be a long (and memory hungry) process. Also, the package betareg has the limitation (at least at the moment of writing) that cannot handle analysis of data when the dependent variable takes value of either exactly 0 or 1.
See vignette for a more detailed explanation of fit_regression
.
A glmulti
object with the best models found.
Calcagno, V., and C. de Mazancourt. 2010. glmulti: an R package for easy automated model selection with (generalized) linear models. Journal of Statistical Software 34:1-29.
Cribari-Neto, F., and Zeileis, A. (2010) Beta regression in R. Journal of Statistical Software 34(2).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 | # Using Pacioni et al. example data. See ?pac.run.lhs and ?pac.lhs for more
# details.
data(pac.run.lhs, pac.lhs)
# Remove base scenario from .stdat data
pac.lhs.no.base <- pac.lhs[!pac.lhs$scen.name == 'ST_LHS(Base)', ]
# Use function lookup_table to obtain correct parameter values at year 0
lkup.ST_LHS <- lookup_table(data=pac.lhs.no.base, project='Pacioni_et_al',
scenario='ST_LHS',
pop='Population 1',
SVs=c('SV1', 'SV2', 'SV3', 'SV4', 'SV5', 'SV6', 'SV7'),
save2disk=FALSE)
# Remove base scenario from .run output in long format
lrun.ST_LHS.no.base <- pac.run.lhs[[2]][!pac.run.lhs[[2]]$Scenario == 'ST_LHS(Base)', ]
reg <- fit_regression(data=lrun.ST_LHS.no.base, lookup=lkup.ST_LHS,
census=FALSE,
project='Pacioni_et_al', scenario='ST_LHS', popn=1,
param='N', vs=c('SV1', 'SV2', 'SV3'), l=2, ncand=30,
save2disk=FALSE)
# Clean up of residual files written by glmulti
# Note, in some OS (W) these files may be locked because in use by R and have
# to be manually after the R session has been either terminated or restarted
file.remove(c('Pacioni_et_al_ST_LHS_N.modgen.back',
'Pacioni_et_al_ST_LHS_N.mods.back'))
# Example of information you can obtained once you have run fit_regression
# The formula for the best model
bestmodel <- reg@formulas[1]
# The formulae for the best 30 model
bestmodels <- reg@formulas
# List of IC values
qaicvalues <- reg@crits
# QAIC differences between the first 5 best models (stored in 'delta')
delta <- as.vector(NULL)
for (i in 1:5) {
del <- qaicvalues[i+1] - qaicvalues[i]
delta <- c(delta, del)
}
# The best model's coefficients
coef.best <- coef(reg@objects[[1]])
# The model averaged coefficients
coef.all <- glmulti::coef.glmulti(reg)
coefs <- data.frame(Estimate=coef.all[,1],
Lower=coef.all[,1] - coef.all[,5],
Upper=coef.all[,1] + coef.all[,5])
# Plot IC profile
plot(reg, type='p')
# Plot of model averaged importance of terms
plot(reg, type='s')
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.