detect_sdg: Detect SDGs in text

Description Usage Arguments Details Value Examples

View source: R/detect.R

Description

detect_sdg identifies SDGs in text using SDG query systems developed by the Aurora Universities Network, SIRIS Academic, and Elsevier.

Usage

1
2
3
4
5
6
7
detect_sdg(
  text,
  systems = c("aurora", "siris", "elsevier"),
  sdgs = 1:17,
  output = c("features", "documents"),
  verbose = TRUE
)

Arguments

text

character vector or object of class tCorpus containing text in which SDGs shall be detected.

systems

character vector specifying the query systems to be used. Can be one or more of "aurora", "siris", "elsevier", "sdsn", and "ontology". By default all but "sdsn" and "ontology" are used.

sdgs

numeric vector with integers between 1 and 17 specifying the sdgs to identify in text. Defaults to 1:17.

output

character specifying the level of detail in the output. The default "features" returns a tibble with one row per matched query, include a variable containing the features of the query that were matched in the text. By contrast, "documents" returns an aggregated tibble with one row per matched sdg, without information on the features.

verbose

logical specifying whether messages on the function's progress should be printed.

Details

detect_sdg implements three SDG query systems developed by the Arora Universities Network (see aurora_queries), SIRIS Academic (see siris_queries), and Elsevier (see elsevier_queries), and one keyword-based system by Bautista-Puig and Mauleón labeled Ontology (see ontology_queries). 'detect_sdg' calls dedicated detect_* for each of the four system. Search of the Lucene-style Boolean queries and the keywords is implemented using the search_features function from the corpustools package.

By default, detect_sdg runs only the three query systems, as they are considerably less liberal than the keyword-based Ontology and therefore likely produce more valid SDG classifications. Users should be aware that systematic validations and comparison between the systems are still largely lacking. Consequently, any results should be interpreted with a high level caution.

Value

The function returns a tibble containing the SDG hits found in the vector of documents. Depending on the value of output the tibble will contain all or some of the following columns:

document

Index of the element in text where match was found. Formatted as a factor with the number of levels matching the original number of documents.

sdg

Label of the SDG found in document.

systems

The name of the query system that produced the match.

query_id

Index of the query within the query system that produced the match.

features

Concatenated list of words that caused the query to match.

hit

Index of hit for a given system.

Examples

1
2
3
4
5
6
7
8
# run sdg detection
hits <- detect_sdg(projects)

# run sdg detection with aurora only
hits <- detect_sdg(projects, systems = "aurora")

# run sdg detection for sdg 3 only
hits <- detect_sdg(projects, sdgs = 3)

psychobas/text2sdg_joss documentation built on Dec. 22, 2021, 9:58 a.m.