View source: R/firststage_functions.R
| run_firststage_fcm | R Documentation |
First stage function of the article selection process taking in a set of documents and returning a set of potential keywords
run_firststage_fcm(
docs,
docidvar = "fakeid",
classvar = "classified",
typevar = "EV_article",
textvar = "description",
stem = TRUE,
min_termfreq = 20,
min_docfreq = 20,
max_termfreq = NULL,
max_docfreq = NULL,
remove_punct = TRUE,
remove_numbers = TRUE,
remove_hyphens = TRUE,
termfreq_type = "count",
docfreq_type = "count",
dfm_tfidf = FALSE,
initialkw = c("elect", "riot", "disturb", "incid"),
cpoint2 = 0.9
)
docs |
Data frame of documents containing classified cases (R-set) and unclassified cases (S-set) |
docidvar |
Unique document id variable; default = "fakeid" |
classvar |
Indicator identifying classified documents; default = "classified" |
typevar |
Indicator identifying election violence articles; default = "EV_article" |
textvar |
Indicator identifying text field to classify on; default = "description" |
stem |
default FALSE |
min_termfreq |
default 20 |
min_docfreq |
default 20 |
max_termfreq |
default NULL |
max_docfreq |
default NULL |
remove_punct |
default TRUE |
remove_numbers |
default TRUE |
remove_hyphens |
default TRUE |
termfreq_type |
default "count" |
docfreq_type |
default "count" |
dfm_tfidf |
default FALSE |
initialkw |
Initial keywords used to retrive classified documents; default = c("elect", "riot", "disturb", "incid") |
cpoint2 |
Cutpoint on predicability of keyword in step 2; default = 0.9 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.