library(topicflow) year <- "2008" #fd.path <- paste0("~/Desktop/full_disclosure_corpus/",year,".parsed") fd.path <- paste0("~/Desktop/bugtraq_corpus/",year,".parsed") folder <- loadFiles(fd.path)
set.seed(1234) models <- list() models[["Jan"]] <- rawToLDA(folder,10,"Jan") models[["Feb"]] <- rawToLDA(folder,10,"Feb") models[["Mar"]] <- rawToLDA(folder,10,"Mar") models[["Apr"]] <- rawToLDA(folder,10,"Apr") models[["May"]] <- rawToLDA(folder,10,"May") models[["Jun"]] <- rawToLDA(folder,10,"Jun") models[["Jul"]] <- rawToLDA(folder,10,"Jul") models[["Aug"]] <- rawToLDA(folder,10,"Aug") models[["Sep"]] <- rawToLDA(folder,10,"Sep") models[["Oct"]] <- rawToLDA(folder,10,"Oct") models[["Nov"]] <- rawToLDA(folder,10,"Nov") models[["Dec"]] <- rawToLDA(folder,10,"Dec")
exportDocumentTermMatrix(models,"~/Desktop/bugtraq_topicflowviz/dtm") exportTopicTermMatrix(models,"~/Desktop/bugtraq_topicflowviz/ttm")
topic.flow <- CalculateTopicFlow(models) write.csv(topic.flow,"~/Desktop/bugtraq_topicflowviz/topic_flow.csv")
Use the generated files, the original corpus (e.g. 2008.parsed), the metadata of the corpus (e.g. 2008.csv) also generated by the crawler, and the setup extension of interest (.reply.body.txt
) as input to topicflowviz like so:
python3 run.py -a bugtraq_2008_reply_body ~/Desktop/bugtraq_corpus/2008.parsed ~/Desktop/bugtraq_corpus/2008.csv .reply.body.txt ~/Desktop/bugtraq_topicflowviz/dtm ~/Desktop/bugtraq_topicflowviz/ttm ~/Desktop/bugtraq_topicflowviz/topic_flow.csv
Note underlines will always turned into spaces, so when adding or deleting you can just use underlines and no double quotes on any parameter.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.