In sailuh/topicflowr: Topic Flow

library(topicflow)
year <- "2008"

#fd.path <- paste0("~/Desktop/full_disclosure_corpus/",year,".parsed")
fd.path <- paste0("~/Desktop/bugtraq_corpus/",year,".parsed")

folder <- loadFiles(fd.path)

set.seed(1234)
models <- list()
models[["Jan"]] <- rawToLDA(folder,10,"Jan")
models[["Feb"]] <- rawToLDA(folder,10,"Feb")
models[["Mar"]] <- rawToLDA(folder,10,"Mar")
models[["Apr"]] <- rawToLDA(folder,10,"Apr")
models[["May"]] <- rawToLDA(folder,10,"May")
models[["Jun"]] <- rawToLDA(folder,10,"Jun")
models[["Jul"]] <- rawToLDA(folder,10,"Jul")
models[["Aug"]] <- rawToLDA(folder,10,"Aug")
models[["Sep"]] <- rawToLDA(folder,10,"Sep")
models[["Oct"]] <- rawToLDA(folder,10,"Oct")
models[["Nov"]] <- rawToLDA(folder,10,"Nov")
models[["Dec"]] <- rawToLDA(folder,10,"Dec")

exportDocumentTermMatrix(models,"~/Desktop/bugtraq_topicflowviz/dtm")
exportTopicTermMatrix(models,"~/Desktop/bugtraq_topicflowviz/ttm")

topic.flow <- CalculateTopicFlow(models)
write.csv(topic.flow,"~/Desktop/bugtraq_topicflowviz/topic_flow.csv")

Use the generated files, the original corpus (e.g. 2008.parsed), the metadata of the corpus (e.g. 2008.csv) also generated by the crawler, and the setup extension of interest (.reply.body.txt) as input to topicflowviz like so:

python3 run.py -a bugtraq_2008_reply_body ~/Desktop/bugtraq_corpus/2008.parsed ~/Desktop/bugtraq_corpus/2008.csv .reply.body.txt ~/Desktop/bugtraq_topicflowviz/dtm ~/Desktop/bugtraq_topicflowviz/ttm ~/Desktop/bugtraq_topicflowviz/topic_flow.csv

Note underlines will always turned into spaces, so when adding or deleting you can just use underlines and no double quotes on any parameter.

sailuh/topicflowr documentation built on May 27, 2019, 8:46 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com