View source: R/firststage_functions.R
run_firststage_fcm | R Documentation |
First stage function of the article selection process taking in a set of documents and returning a set of potential keywords
run_firststage_fcm( docs, docidvar = "fakeid", classvar = "classified", typevar = "EV_article", textvar = "description", stem = TRUE, min_termfreq = 20, min_docfreq = 20, max_termfreq = NULL, max_docfreq = NULL, remove_punct = TRUE, remove_numbers = TRUE, remove_hyphens = TRUE, termfreq_type = "count", docfreq_type = "count", dfm_tfidf = FALSE, initialkw = c("elect", "riot", "disturb", "incid"), cpoint2 = 0.9 )
docs |
Data frame of documents containing classified cases (R-set) and unclassified cases (S-set) |
docidvar |
Unique document id variable; default = "fakeid" |
classvar |
Indicator identifying classified documents; default = "classified" |
typevar |
Indicator identifying election violence articles; default = "EV_article" |
textvar |
Indicator identifying text field to classify on; default = "description" |
stem |
default FALSE |
min_termfreq |
default 20 |
min_docfreq |
default 20 |
max_termfreq |
default NULL |
max_docfreq |
default NULL |
remove_punct |
default TRUE |
remove_numbers |
default TRUE |
remove_hyphens |
default TRUE |
termfreq_type |
default "count" |
docfreq_type |
default "count" |
dfm_tfidf |
default FALSE |
initialkw |
Initial keywords used to retrive classified documents; default = c("elect", "riot", "disturb", "incid") |
cpoint2 |
Cutpoint on predicability of keyword in step 2; default = 0.9 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.