fit_model: Fit Model

View source: R/cluster_model.R

fit_modelR Documentation

Fit Model

Description

fit_model() fits a Bayesian hierarchical model to the model training data in model_docs and draws samples from the model as Markov Chain Monte Carlo (MCMC) estimates.

Usage

fit_model(
  main_dir,
  model_docs,
  num_iters,
  num_chains = 1,
  num_cores,
  writer_indices,
  doc_indices,
  a = 2,
  b = 0.25,
  c = 2,
  d = 2,
  e = 0.5
)

Arguments

main_dir

A directory that contains a cluster template created by make_clustering_template()

model_docs

A directory containing model training documents

num_iters

An integer number of iterations of MCMC.

num_chains

An integer number of chains to use.

num_cores

An integer number of cores to use for parallel processing clustering assignments. The model fitting is not done in parallel.

writer_indices

A vector of the start and stop character of the writer ID in the model training file names. E.g., if the file names are writer0195_doc1, writer0210_doc1, writer0033_doc1 then writer_indices is 'c(7,10)'.

doc_indices

A vector of the start and stop character of the "document name" in the model training file names. This is used to distinguish between two documents written by the same writer. E.g., if the file names are writer0195_doc1, writer0195_doc2, writer0033_doc1, writer0033_doc2 then doc_indices are 'c(12,15)'.

a

The shape parameter for the Gamma distribution in the hierarchical model

b

The rate parameter for the Gamma distribution in the hierarchical model

c

The first shape parameter for the Beta distribution in the hierarchical model

d

The second shape parameter for the Beta distribution in the hierarchical model

e

The scale parameter for the hyper prior for mu in the hierarchical model

Value

A list of training data used to fit the model and the fitted model

Examples

## Not run: 
main_dir <- "/path/to/main_dir"
model_docs <- "path/to/model_training_docs"
questioned_docs <- "path/to/questioned_docs"

model <- fit_model(
  main_dir = main_dir,
  model_docs = model_docs,
  num_iters = 100,
  num_chains = 1,
  num_cores = 2,
  writer_indices = c(2, 5),
  doc_indices = c(7, 18)
)

model <- drop_burnin(model = model, burn_in = 25)

analysis <- analyze_questioned_documents(
  main_dir = main_dir,
  questioned_docs = questioned_docs,
  model = model,
  num_cores = 2
)
analysis$posterior_probabilities

## End(Not run)


handwriter documentation built on Oct. 25, 2024, 1:06 a.m.