get_classified_docs: Function to get documents from database which are classified...

get_classified_docsR Documentation

Function to get documents from database which are classified for text analysis.

Description

get_classified_docs retrives the currently classified document set from the database. The documents have been classifed to enable machine learning.

Usage

get_classified_docs()

Value

Returns a dataframe containing some general articles, some election but not violent articles and some election violence articles. The dataframe can be split into separate corpuses (if desired). The different types of article are distinguished using dummy variables: EV_article is 1 for election violence articles and 0 for all other articles. election_article is 1 for election articles (including election violence articles) and 0 for all other articles. (there is an unncessary just_election indicator for convenience which is 1 for election but not violence articles and 0 for election violence and general articles). The full ocr is in the field ocr and the short two line description is in the field description. The unique identifier column fakeid does not correspond to any id in the database because the data is aggregated from two different tables in the database (documents and candidate_documents).


gidonc/durhamevp documentation built on April 8, 2022, 10:31 a.m.