data_gen: data_gen
In cvraut/viLDA: variational inference Latent Dirichlet Allocation

Function to generate a simulated dataset following the LDA model.

`n_doc`	The number of documents to generate
`n_vocab`	The number of words in the corpus
`n_top`	The number of topics/clusters (K)
`doc_length_scale`	A number proportional to the average number of words in a document (default = 8)
`doc_length_scale_var`	A number proportional to the variance of the average number of words in a document (default = 2)
`voc_p_scale`	A number proportional to the initial probability of each word in a cluster. (default = 4) The higher, the less uniform weight gets applied across all topics.
`spike_overlap`	A number proportional to the amount of vocabulary shared across documents from different clusters. (default = 0.05) The default value of 0.05 means that documents from different clusters will share ~5% of their word distributions with each other.
`alphaWords`	Hyperparameter for document-cluster distribution (default = 0.2)
`alphaTopics`	Hyperparameter for topic-cluster distribution (default = 0.2)
`seed`	The random seed for the data generation (ran once at beginning of function, default = 19890418)
`topic_mix`	Boolean flag, if TRUE then each document can be generated from different topic clusters (default = FALSE)
`DEBUG`	Boolean flag, if TRUE then debug print statements are shown to the user (default = FALSE)