library(topicflow)
year <- "2008"

#fd.path <- paste0("~/Desktop/full_disclosure_corpus/",year,".parsed")
fd.path <- paste0("~/Desktop/bugtraq_corpus/",year,".parsed")

folder <- loadFiles(fd.path)
set.seed(1234)
models <- list()
models[["Jan"]] <- rawToLDA(folder,10,"Jan")
models[["Feb"]] <- rawToLDA(folder,10,"Feb")
models[["Mar"]] <- rawToLDA(folder,10,"Mar")
models[["Apr"]] <- rawToLDA(folder,10,"Apr")
models[["May"]] <- rawToLDA(folder,10,"May")
models[["Jun"]] <- rawToLDA(folder,10,"Jun")
models[["Jul"]] <- rawToLDA(folder,10,"Jul")
models[["Aug"]] <- rawToLDA(folder,10,"Aug")
models[["Sep"]] <- rawToLDA(folder,10,"Sep")
models[["Oct"]] <- rawToLDA(folder,10,"Oct")
models[["Nov"]] <- rawToLDA(folder,10,"Nov")
models[["Dec"]] <- rawToLDA(folder,10,"Dec")
exportDocumentTermMatrix(models,"~/Desktop/bugtraq_topicflowviz/dtm")
exportTopicTermMatrix(models,"~/Desktop/bugtraq_topicflowviz/ttm")
topic.flow <- CalculateTopicFlow(models)
write.csv(topic.flow,"~/Desktop/bugtraq_topicflowviz/topic_flow.csv")

Use the generated files, the original corpus (e.g. 2008.parsed), the metadata of the corpus (e.g. 2008.csv) also generated by the crawler, and the setup extension of interest (.reply.body.txt) as input to topicflowviz like so:

python3 run.py -a bugtraq_2008_reply_body ~/Desktop/bugtraq_corpus/2008.parsed ~/Desktop/bugtraq_corpus/2008.csv .reply.body.txt ~/Desktop/bugtraq_topicflowviz/dtm ~/Desktop/bugtraq_topicflowviz/ttm ~/Desktop/bugtraq_topicflowviz/topic_flow.csv

Note underlines will always turned into spaces, so when adding or deleting you can just use underlines and no double quotes on any parameter.



sailuh/topicflowr documentation built on May 27, 2019, 8:46 a.m.