Run_Full_Model: A Function to run the ContentStructure model to convergence...

Description Usage Arguments Value

Description

A Function to run the ContentStructure model to convergence for one dataset.

Usage

1
2
3
4
5
6
7
Run_Full_Model(Auth_Attr, Doc_Edge_Matrix, Doc_Word_Matrix, Vocab,
  main_iterations = 4000, sample_step_burnin = 2e+06,
  sample_step_iterations = 8e+06, sample_step_sample_every = 2000,
  topics = 10, clusters = 2, latent_space_dimensions = 2,
  run_MH_only = F, mixing_variable = NULL, Seed = 123456,
  save_results_to_file = FALSE, output_directory = NULL,
  output_filename = NULL, Main_Estimation_Results = NULL)

Arguments

Auth_Attr

A dataframe with one row for each unique sender/reciever and containing atleast one column with the ID of each sender/reciever and any number of additional varaibles which will be ignored unless specified as a binary attribute for which the user would like to calculate mixing parameter estimates by specifying the mixing_variable.

Doc_Edge_Matrix

A matrix with one row for each email and one column which records the index of the sender of the email (indexed from 1) followed by one column for each unique sender/receiver in the dataset.

Doc_Word_Matrix

A matrix with one row for each email and one column for each unique word in the vocabulary that records the number of times each word was used in each document.

Vocab

A vector containing every unique term in the vocabulary an corresponding in length to the number of columns in the Doc_Word_Matrix.

main_iterations

The number of iterations of Gibbs sampling for the LDA part of the model. We have found 4,000 seems to work well.

sample_step_burnin

The number of iterations of burnin that should be completed before sampling the latent space parameters when running MH for the LSM to convergence.

sample_step_iterations

The total number of iterations to run MH for the LSM for (before thinning).

sample_step_sample_every

How many iterations to skip when thinning the MH for the LSM chain in our MH for the LSM sample step.

topics

The number of topics to use

clusters

The number of topic clusters to use.

latent_space_dimensions

THe number of dimensions to be included in the latent space model. Note that plotting is only currently supported for two dimensions.

run_MH_only

If TRUE, then we only rerun MH for the LSM to convergence

mixing_variable

if not NULL, specifies the name of the binary variable in the author_attributes dataset that will be used to estimate mixing parameter effects.

Seed

Sets the seed in R and C++ for replicability.

save_results_to_file

A logical value indicating whether intermediate results should be saved to file or whether they will be return to the R session.

output_directory

This is where all output will be saved. Defaults to NULL if save_results_to_file == FALSE.

output_filename

The name of the .Rdata file you would like to save model output in. Defaults to NULL if save_results_to_file == FALSE.

Main_Estimation_Results

A list object returned by previous model estimation to be supplied if the user wishes to select run_MH_only == TRUE. Useful if the user would like to specify a greater number of iterations for the final step of LSM estimation.

Value

Does not return anything, just saves everything to our data_directory folder.


matthewjdenny/ContentStructure documentation built on May 21, 2019, 1:01 p.m.