Description Usage Arguments Details Value References Examples
Fits a partial linear bivariate penalized spline model
1 2 3 4 5 | plbpsm(formula, family = gaussian(), data = list(), ind_c = NULL,
group = NULL, weights = NULL, na.action, offset = NULL,
criterion = "GCV", method = "SCAD", control = list(), scale = 1,
VS = FALSE, fit = TRUE, G = NULL, drop.unused.levels = TRUE,
drop.intercept = NULL, ecdf = FALSE, backfitting = TRUE, ...)
|
formula |
a PLBPSM formula. These are exactly like the formula for a GLM except that univariable and bivariate smooth terms. |
family |
This is a family object specifying the distribution
and link to use in fitting etc (see |
data |
A data frame or list containing the model response variable and
covariates required by the formula. By default the variables are taken from
environment( |
ind_c |
Defualt set to ' |
group |
A vector describing the grouping of the coefficients. For greatest efficiency and least ambiguity (see details), it is best if group is a factor or vector of consecutive integers, although unordered groups and character vectors are also allowed. If there are coefficients to be included in the model without being penalized, assign them to group 0 (or "0"). |
weights |
weights on the contribution of the data to the log likelihood.
Note that a weight of 2, for example, is equivalent to having made exactly the same observation twice.
If you want to reweight the contributions of each datum without changing
the overall magnitude of the log likelihood, then you should normalize the weights
(e.g. |
na.action |
a function which indicates what should happen when the data contain ‘NA’s. The default is set by the 'na.action' setting of 'options', and is 'na.fail' if that is unset. The 'factory-fresh' default is 'na.omit'. |
offset |
Can be used to supply a model offset for use in fitting.
Note that this offset will always be completely ignored when predicting, unlike an offset
included in formula (this used to conform to the behaviour of lm and |
criterion |
The criterion to choose the penalty parameter lambda. |
method |
The variable selection method for linear covariates. |
control |
A list of fit control parameters to replace defaults returned by |
scale |
scale parameter in exponential family |
VS |
' |
fit |
If this argument is |
G |
Usually |
drop.unused.levels |
by default unused levels are dropped from factors before fitting. For some smooths involving factor variables you might want to turn this off. Only do so if you know what you are doing. |
drop.intercept |
Set to TRUE to force the model to really not have the a constant in the parametric model part, even with factor variables present. Can be vector when formula is a list. |
ecdf |
the choice of whether empirical conditional density function is used. |
backfitting |
whether spline backfitted local polynomial estimation is applied. |
... |
further arguments for passing on e.g. to |
A generalized geoadditive model (GgAM) is a generalized linear model (GLM) in which the linear predictor is given by user specified univariate functions plus additive functions and a bivariate function of the location covariates of the linear predictor. A simple example is:
log(E(y_i))= u(z_1) + u(z_2) + f_{(x_{1i},x_{2i})}
where the (independent) response variables y_i~Poi, z_1 and z_2 are explantary variables, f is the bivariate smooth function of covariates x_{1i} and x_{2i}. The log is an example of a link function.
A Generalized Geoadditive Model (GgAM) is a generalized linear model (GLM) in which the linear predictor is given by a user specified bivariate functions of the location covariates plus a conventional parametric component of the linear predictor. A simple example is:
log(E(y_i))= z_1 + z_2 + u_{x_1} + u_{x_2} + b_{(s_{1i},s_{2i})}
where the (independent) response variables y_i~Poi, z_1 and z_2 are explantary variables, f is the bivariate smooth function of covariates x_{1i} and x_{2i}. The log is an example of a link function.
Model structure identification process is contained in plbpsm
before model fitting to identify the linear
and nonlinear continuous variables by using group adaptive LASSO.
We incorporate a variable selection mechanism into the Partial linear Spatial Regression Model (PLSM) and propose a double penalized least squares approach based on bivariate spline approximation over the spatial domain when the link is gaussian.
plbpsm
in GgAM
chooses the smoothing penalty parameter by using the
Generalized Cross Validation (GCV) criterion or Cross validation (CV) criterion. The formula for GCV is:
n D/(n - DoF)^2
where D is the deviance, n the number of data, s the scale parameter and DoF the effective degrees of freedom of the model. The formula for CV is:
\frac { 1 } { n } ∑ _ { i = 1 } ^ { n } ≤ft( y _ { i } - \hat { y } ^ { ( - i ) } \right) ^ { 2 }
,
where \hat{y}^ { ( - i ) } is the prediction of y_i computed by leaving out data points from i
th folder and
n is the number of folders.
In terms of shrinkage penalty, Adaptive LASSO and SCAD penaltiy could be used to do variable selection under PLSM and GPLSM.
Broadly plbpsm
works by first constructing basis functions and penalty
coefficient matrices for bivariate smooth term in the model formula, obtaining a model matrix for
the parametric part of the model formula. The algorithm chooses penalty parameter based on GCV/CV criterion
for the bivariate smoothing part and chooses significant linear covariates based on adaptive LASSO and
SCAD method using BIC criterion. And then, the refit is applied to the choosen model to get the final fit.
Details of the default underlying fitting methods are given in Yu et al. (2019), Wang et al.(2018), and Wang et al. (2019).
If fit=FALSE
the function returns a list G
of items needed to fit a PLBPSM, but doesn't actually fit it.
Otherwise the function returns an object of class "plbpsm
" as described in plbpsmObject
.
ADD MORE.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 | library(MASS)
library(grpreg)
library(mgcv)
library(Triangulation)
library(BPST)
#################### Horseshoe Domain Example ######################
###### VS in PLSM ######
data("eg1pop_dat")
eg1_V2 <- eg1pop_dat[['V2']]
eg1_T2 <- eg1pop_dat[['T2']]
eg1pop_rho03 <- eg1pop_dat[['rho03']]
n <- 100
Npop <- nrow(eg1pop_rho03)
# ind.pop <- (1:Npop)[!is.na(eg1pop_rho03[,1])]
ind.pop <- (1:Npop)
set.seed(1234)
sam.ind <- sort(sample(ind.pop,n,replace=FALSE))
sam <- eg1pop_rho03[sam.ind,]
lambda <- 10^(seq(-2,5,by=1))
data <- sam
data("horseshoe")
V <- TriMesh(horseshoe,4)$V
Tr <- TriMesh(horseshoe,4)$Tr
formula <- Y~z1+z2+z3+z4+z5+z6+z7+z8+b(x1,x2,V=V,Tr=Tr,d=2,r=1)
# example 0:default lambda
res0 <- plbpsm(formula=formula,data=as.data.frame(data),VS=TRUE)
predict(res0,as.data.frame(data))
res0$fitted.values
summary(res0)
### Estimation ###
formula <- Y~z1+z2+z3+z4+z5+z6+z7+z8+b(x1,x2,V=eg1_V2,Tr=eg1_T2,lambda=lambda,d=2,r=1)
res <- plbpsm(formula=formula,data=as.data.frame(data),VS=TRUE)
# ind_c is provided
res4 <- plbpsm(formula=formula,data=as.data.frame(sam),ind_c=c(1,2,4),VS=TRUE)
###### GPLSM ######
data(eg1pop_bin_rho00)
formula <- Y~z1+z2+z3+b(x1,x2,V=eg1_V2,Tr=eg1_T2,lambda=lambda,d=2,r=1)
n <- 100
Npop <- nrow(eg1pop_bin_rho00)
# ind.pop <- (1:Npop)[!is.na(eg1pop_bin_rho00[,1])]
ind.pop <- (1:Npop)
sam.ind <- sort(sample(ind.pop,n,replace=FALSE))
sam <- eg1pop_bin_rho00[sam.ind,]
data <- sam
res_eg1_bin <- plbpsm(formula=formula,data=as.data.frame(data),family=binomial())
data(eg1pop_nb_rho00)
formula <- Y~z1+z2+z3+b(x1,x2,V=eg1_V2,Tr=eg1_T2,d=2,r=1)
n <- 100
Npop <- nrow(eg1pop_bin_rho00)
# ind.pop <- (1:Npop)[!is.na(eg1pop_bin_rho00[,1])]
ind.pop <- (1:Npop)
sam.ind <- sort(sample(ind.pop,n,replace=FALSE))
sam <- eg1pop_bin_rho00[sam.ind,]
res_nb <- plbpsm(formula=formula,data=as.data.frame(sam),family=negbin(c(2:8)))
### GgAM ####
# Poisson #
data(eg1pop_poi2)
n <- 1000
set.seed(2008)
Npop <- nrow(eg1pop_poi2)
# ind.pop <- (1:Npop)[!is.na(eg1pop_poi2[,1])]
ind.pop <- (1:Npop)
sam.ind <- sort(sample(ind.pop,n,replace=FALSE))
sam <- eg1pop_poi2[sam.ind,]
formula <- Y~u(z1)+u(z2)+u(z3)
res_eg1_poi_add0 <- plbpsm(formula=formula,data=as.data.frame(sam),family=quasipoisson(),
group=c(0,0,0))
summary(res_eg1_poi_add0)
formula <- Y~u(z1)+u(z2)+u(z3)+b(x1,x2,V=eg1_V2,Tr=eg1_T2,d=2,r=1)
### Estimation ###
res_eg1_poi_add <- plbpsm(formula=formula,data=as.data.frame(sam),family='poisson',group=c(0,0,0))
res_intercptonly <- plbpsm(formula=Y~1,data=as.data.frame(sam),family='poisson')
### Inference & Plotting ###
summary(res_eg1_poi_add)
plot(res_eg1_poi_add)
### GgAM Prediction ###
pred <- predict(res_eg1_poi_add,as.data.frame(eg1pop_poi2[1:200,]))
origy <- eg1pop_poi2[1:200,'Y']
origy[which(is.na(pred))] <- NA
mean((pred-origy)^2,na.rm=TRUE)
# 2.8527
### GGAM-SMILE: Nonlinear detection ###
data('dat_poi_ggams')
formula <- y~u(x1)+u(x2)+u(x3)+b(s1,s2,V=V,Tr=Tr,d=2,r=1)
res_poi <- plbpsm(formula=formula,data=as.data.frame(dat_poi_ggams),family = "poisson")
# compare to the following Known GGAM model #
formula <- y~x1+u(x2)+u(x3)+b(s1,s2,V=V,Tr=Tr,d=2,r=1)
res_poi <- plbpsm(formula=formula,data=as.data.frame(dat_poi_ggams),family = "poisson",group=c(0,0))
# end of horseshoe domain example
################### Rectangular Domain Example #######################
### VS in PLSM ###
data("eg2pop_dat")
eg2_V20=eg2pop_dat[['V20']]
eg2_T20=eg2pop_dat[['T20']]
eg2pop=eg2pop_dat[['pop']]
n=100
Npop=nrow(eg2pop)
# ind.pop=(1:Npop)[!is.na(eg2pop[,1])]
ind.pop <- (1:Npop)
sam.ind <- sort(sample(ind.pop,n,replace=FALSE))
sam=eg2pop[sam.ind,]
formula <- Y~z1+z2+z3+z4+z5+z6+z7+z8+b(x1,x2,V=eg2_V20,Tr=eg2_T20)
res0 <- plbpsm(formula=formula,data=as.data.frame(sam),VS=TRUE)
### More complicated
formula <- Y~z2+u(z1)+b(x1,x2,V=eg2_V20,Tr=eg2_T20)
res_ex <- plbpsm(formula=formula,data=as.data.frame(data),group=c(0,0))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.