hdpGLM | R Documentation |

The function estimates a semi-parametric mixture of Generalized Linear Models. It uses a (hierarchical) Dependent Dirichlet Process Prior for the mixture probabilities.

```
hdpGLM(
formula1,
formula2 = NULL,
data,
mcmc,
family = "gaussian",
K = 100,
context.id = NULL,
constants = NULL,
weights = NULL,
n.display = 1000,
na.action = "exclude",
imp.bin = "R"
)
```

`formula1` |
a single symbolic description of the linear model of the
mixture GLM components to be fitted. The syntax is the same
as used in the |

`formula2` |
eihter NULL (default) or a single symbolic description of the
linear model of the hierarchical component of the model.
It specifies how the average parameter of the base measure
of the Dirichlet Process Prior varies linearly as a function
of group level covariates. If |

`data` |
a data.frame with all the variables specified in |

`mcmc` |
a named list with the following elements - - - - - |

`family` |
a character with either 'gaussian', 'binomial', or 'multinomial'. It indicates the family of the GLM components of the mixture model. |

`K` |
an integer indicating the maximum number of clusters to truncate the Dirichlet Process Prior in order to use the blocked Gibbs sampler. |

`context.id` |
string with the name of the column in the data that uniquely identifies the contexts. If |

`constants` |
either NULL or a list with the constants of the model. If not NULL,
it must contain a vector named |

`weights` |
numeric vector with the same size as the number of rows of the data. It must contain the weights of the observations in the data set. NOTE: FEATURE NOT IMPLEMENTED YET |

`n.display` |
an integer indicating the number of iterations to wait before printing information about the estimation process. If zero, it does not display any information. Note: displaying informaiton at every iteration (n.display=1) may increase the time to estimate the model slightly. |

`na.action` |
string with action to be taken for the |

`imp.bin` |
string, either "R" or "Cpp" indicating the language of the implementation of the binomial model. |

This function estimates a Hierarchical Dirichlet Process generalized
linear model, which is a semi-parametric Bayesian approach to regression
estimation with clustering. The estimation is conducted using Blocked Gibbs Sampler if the output
variable is gaussian distributed. It uses Metropolis-Hastings inside Gibbs if
the output variable is binomial or multinomial distributed.
This is specified using the parameter `family`

. See:

Ferrari, D. (2020). Modeling Context-Dependent Latent Effect Heterogeneity, Political Analysis, 28(1), 20–46. doi:10.1017/pan.2019.13.

Ferrari, D. (2023). "hdpGLM: An R Package to Estimate Heterogeneous Effects in Generalized Linear Models Using Hierarchical Dirichlet Process." Journal of Statistical Software, 107(10), 1-37. doi:10.18637/jss.v107.i10.

Ishwaran, H., & James, L. F., Gibbs sampling methods for stick-breaking priors, Journal of the American Statistical Association, 96(453), 161–173 (2001).

Neal, R. M., Markov chain sampling methods for dirichlet process mixture models, Journal of computational and graphical statistics, 9(2), 249–265 (2000).

The function returns a list with elements `samples`

, `pik`

, `max_active`

,
`n.iter`

, `burn.in`

, and `time.elapsed`

. The `samples`

element
contains a MCMC object (from coda package) with the samples from the posterior
distribution. The `pik`

is a `n x K`

matrix with the estimated
probabilities that the observation $i$ belongs to the cluster $k$

```
## Note: this example is for illustration. You can run the example
## manually with increased number of iterations to see the actual
## results, as well as the data size (n)
set.seed(10)
n = 300
data = tibble::tibble(x1 = rnorm(n, -3),
x2 = rnorm(n, 3),
z = sample(1:3, n, replace=TRUE),
y =I(z==1) * (3 + 4*x1 - x2 + rnorm(n)) +
I(z==2) * (3 + 2*x1 + x2 + rnorm(n)) +
I(z==3) * (3 - 4*x1 - x2 + rnorm(n))
)
mcmc = list(burn.in = 0, n.iter = 20)
samples = hdpGLM(y~ x1 + x2, data=data, mcmc=mcmc, family='gaussian',
n.display=30, K=50)
summary(samples)
plot(samples)
plot(samples, separate=TRUE)
## compare with GLM
## lm(y~ x1 + x2, data=data, family='gaussian')
```

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.