Create_Output: A Function to generate diagnostic plots and interpretable...

Description Usage Arguments Value

Description

A Function to generate diagnostic plots and interpretable output for a list of 1 to N model output objects from running the ContentStructure model.

Usage

1
2
3
4
5
6
Create_Output(output_names = NULL, Estimation_Results = NULL,
  only_generate_summaries = T, print_agg_stats = T,
  Topic_Model_Burnin = 50, Skip = 0, Thin = 1,
  load_results_from_file = F, estimation_output_list = NULL,
  estimation_output_directory = NULL, using_county_email_data = F,
  save_results = F, output_directory = NULL)

Arguments

output_names

An optional (vector) of file names that can be used to name the output .pdf files generated by this function (if save_results == TRUE) . Useful if you would like your output files to look like Org_Plot_X.pdf instead of Org_Results_of_9-30-15_Plot_X.pdf, for example.

Estimation_Results

A list object returned by Run_Full_Model that will be used to generate output. Can be NULL if load_results_from_file == TRUE in which case intermediate results saved to disk will be used instead.

only_generate_summaries

If TRUE, then only generate a one-page-per-cluster pdf summary of model output for each county, otherwise generate a ton of output.

print_agg_stats

If TRUE, generates a plot comapring topics frequency across all clusters and a trace plot of th topic model log likelihood – very useful.

Topic_Model_Burnin

The number of iterations of Gibbs sampling that should be discarded before calculated Geweke statistic to determine model convergence. You will simply want to set it pretty low and then look at the trace to determine where you should set it to provide evidence of convergence.

Skip

The number of MH for LSM iterations to skip when generating out (if your burnin was not long enough).

Thin

The number of iterations to skip in the MH for LSM chain when generating output. Set to 1 as default does not thin but can be set higher to make plotting easier if you took a lot of samples.

load_results_from_file

A logical which defaults to FALSE. If TRUE, then the function will load data named by data_name output from Run_Full_Model (note that save_results_to_file must be set to TRUE in this function) in the data_directory and use this to generate output.

estimation_output_list

A vector containing the names of organization data files as in the function to run the model. Should only contain names of those organizations for which output has already been created. Only for use with county government datasets (not for public use).

estimation_output_directory

The directory where all .Rdata files generated by Run_Full_Model() are stored. Defaults to NULL if we are not reading in any intermediate data from disk.

using_county_email_data

Logical if you are using North Carolina County Government email data that are properly formatted to produce aggregate level output.

save_results

Defaults to FALSE, if TRUE, then output_names and output_directory must be supplied and .pdfs will be created for all plots.

output_directory

A directory where we wish to save the output of this function.

Value

A list object with 4 entries and the following structure: Cluster_Data contains cluster level data including top words and mixing parameters (with standard errors) if applicable. Actor_Data contains actors level data (all of the Auth_Attr dataframe) plus average latent positions for each actor in each dimension (2 currently), for each cluster. Token_Data contains the counts of each token for each topic, along with the edge counts for that topic and the cluster assignment for it. Vocabulary simply holds the vocabulary as a vector for easy handling.


matthewjdenny/ContentStructure documentation built on May 21, 2019, 1:01 p.m.