R/gen_find.R

Defines functions gen_find

Documented in gen_find

#' General Search for Objects Through A Database
#'
#' @description Function to search through the databases GENESIS, Zensus 2022 and regionalstatistik.de. It is similar in usage as the search function on the GENESIS main page (https://www-genesis.destatis.de/genesis/online).
#' In the search query, "UND" (German word for 'and', also written "und" or "&") as well as "ODER" (German word for 'or', also written "oder" or "|") can be included and logically combined. Furthermore, wildcards are possible by including "*". If more then one word is included in the term string, 'and' is used automatically to combine the different words.
#' Important note: Time-series are treated as cubes in GENESIS and regionalstatistik.de, they are not longer distinguished. If you want to find a specific object with a clear code with this find function, you need to specify the object type or search for all object types.
#'
#' @param term A character string with no maximum character length, but a word limit of five words.
#' @param database Character string. Indicator if the GENESIS ('genesis'), Zensus 2022 ('zensus') or regionalstatistik.de ('regio') database is called. Default option is 'all'.
#' @param category Character string. Specify specific GENESIS/regionalstatistik.de object types ('tables', 'statistics' and 'cubes') and specific Zensus 2022 object types ('tables' and 'statistics'). All types that are specific for one database can be used together. Default option is to use all types that are possible for the specific database.
#' @param detailed Boolean. Indicator if the function should return the detailed output of the iteration including all object-related information or only a shortened output including only code and object title. Default option is 'FALSE'.
#' @param ordering A logical. Indicator if the function should return the output of the iteration ordered first based on the fact if the searched term is appearing in the title of the object and secondly on an estimator of the number of variables in this object. Default option is 'TRUE'.
#' @param pagelength Integer. Maximum length of results or objects (e.g., number of tables). Defaults to 500. Maximum of the databases is 25,000 objects.
#' @param error.ignore Boolean. Indicator if the function should stop if an error occurs or no object for the request is found or if it should produce a token as response. Default option is 'TRUE'. Important note: This does not apply to errors generated by the API call itself (especially server or client errors with status codes 5xx or 4xx). In this case, the function call will error even if error.ignore is set to TRUE.
#' @param verbose Boolean. Indicator if the output of the function should include detailed messages and warnings. Default option is 'TRUE'. Set the parameter to 'FALSE' to suppress additional messages and warnings.
#' @param ... Additional parameters for the API call. These parameters are only affecting the call itself, no further processing. For more details see `vignette("additional_parameter")`.
#'
#' @return A list with all recalled elements from the API. Based on the 'detailed' parameter it contains more or less information, but always includes the code of the object, the title, and the type of the object. This is done to facilitate further processing with the data. Attributes are added to the data.frame describing the search configuration for the returned output.
#' @export
#'
#' @examples
#' \dontrun{
#' # Find objects related to "bus" in GENESIS
#' object <- gen_find(term = "bus")
#'
#' # Find tables related to "bus" in GENESIS and return a unordered detailed output
#' object <- gen_find(term = "bus", detailed = TRUE, ordering = FALSE)
#'
#' # Find tables related to "Autos" or "Corona" in GENESIS and return a unordered detailed output
#' object <- gen_find(term = "autos ODER corona", detailed = TRUE, ordering = FALSE)
#'
#' #' # Find tables related to "Autos" and "Corona" in GENESIS and return a unordered detailed output
#' object <- gen_find(term = "autos UND corona", detailed = TRUE, ordering = FALSE)
#' }
#'
gen_find <- function(term = NULL,
                     database = c("all", "genesis", "zensus", "regio"),
                     category = c("all", "tables", "statistics", "variables", "cubes"),
                     detailed = FALSE,
                     ordering = TRUE,
                     pagelength = 500,
                     error.ignore = TRUE,
                     verbose = TRUE,
                     ...) {

  caller <- as.character(match.call()[1])

  check_function_input(term = term,
                       category = category,
                       detailed = detailed,
                       ordering = ordering,
                       pagelength = pagelength,
                       error.ignore = error.ignore,
                       database = database,
                       caller = caller,
                       verbose = verbose)

  # database_vector will hold a vector of the specified databases to query
  database_vector <- test_database_function(database,
                                            error.input = error.ignore,
                                            text = verbose)

  category <- match.arg(category)

  #-----------------------------------------------------------------------------

  res <- lapply(database_vector, function(db){

    if (verbose) {

      info <- paste("Started the processing of", db, "database.")

      message(info)

    }

    #---------------------------------------------------------------------------

    if (db == "zensus" && category == "cubes") {

      empty_object <- "FAIL"

    } else {

      results_raw <- gen_api(endpoint = "find/find",
                             database = db,
                             username = gen_auth_get(database = db)$username,
                             password = gen_auth_get(database = db)$password,
                             term = term,
                             category = category,
                             pagelength = pagelength,
                             ...)

      results_json <- test_if_json(results_raw)

      empty_object <- test_if_error_find(results_json, para = error.ignore, verbose = verbose)

      empty_object <- test_if_process_further(results_json, para = error.ignore, verbose = verbose)

    }

    #---------------------------------------------------------------------------

    if (isTRUE(empty_object)) {

      list_resp <- list("Output" = "No object found for your request.")

      attr(list_resp, "Term") <- results_json$Parameter$term
      attr(list_resp, "Database") <- db
      attr(list_resp, "Language") <- results_json$Parameter$language
      attr(list_resp, "Pagelength") <- results_json$Parameter$pagelength
      attr(list_resp, "Copyright") <- results_json$Copyright

      return(list_resp)

    } else if (empty_object == "FAIL" & db == "zensus" ){

      list_resp <- list("Output" = "There are generally no 'cubes' objects available for the 'zensus' database.")

      attr(list_resp, "Term") <- term
      attr(list_resp, "Database") <- db
      attr(list_resp, "Category") <- category

      return(list_resp)

    } else if (isFALSE(empty_object)){

      list_resp <- list("Output" = results_json$Status$Content)

      attr(list_resp, "Term") <- results_json$Parameter$term
      attr(list_resp, "Database") <- db
      attr(list_resp, "Language") <- results_json$Parameter$language
      attr(list_resp, "Pagelength") <- results_json$Parameter$pagelength
      attr(list_resp, "Copyright") <- results_json$Copyright

      return(list_resp)

    } else if (empty_object == "DONE") {

    #---------------------------------------------------------------------------

      if (category == "all") {

        category <- c("tables", "statistics", "variables", "cubes")

      }

    #---------------------------------------------------------------------------

      if("tables" %in% category) {

        if(!is.null(results_json$Tables)) {

          if(isTRUE(detailed)){

            df_table <- binding_lapply(results_json$Tables,
                                       characteristics = c("Code",
                                                           "Content",
                                                           "Time"))

            df_table$Spezifisch <- ggsub(df_table)

            df_table$Variablen <- spezifisch_create(df_table)

            df_table$Object_Type <- "table"


            if (nrow(df_table) != 0) {

              df_table$Titel <- titel_search(df_table, term, text = verbose)

            }


            if (isTRUE(ordering)) {

              df_table <- df_table[with(df_table, order(-Titel, -Variablen)), c("Code",
                                                                                "Content",
                                                                                "Titel",
                                                                                "Time",
                                                                                "Variablen",
                                                                                "Spezifisch",
                                                                                "Object_Type")]

            } else {

              df_table <- df_table[, c("Code",
                                       "Content",
                                       "Titel",
                                       "Time",
                                       "Variablen",
                                       "Spezifisch",
                                       "Object_Type")]

            }


          } else if (isFALSE(detailed)) {

            df_table <- binding_lapply(results_json$Tables,
                                       characteristics = c("Code",
                                                           "Content"))

            df_table$Spezifisch <- ggsub(df_table)

            df_table$Variablen <- spezifisch_create(df_table)

            df_table$Object_Type <- "table"


            if (nrow(df_table) != 0) {

              df_table$Titel <- titel_search(df_table, term, text = verbose)

            }


            if (isTRUE(ordering)) {

              df_table <- df_table[with(df_table, order(-Titel, -Variablen)), c("Code",
                                                                                "Content",
                                                                                "Object_Type")]
            } else {

              df_table <- df_table[, c("Code",
                                       "Content",
                                       "Object_Type")]

            }

          }

        } else {

          df_table <- find_token(results_json$Tables,
                                 error.input = error.ignore,
                                 text = verbose,
                                 sub_category = "Tables")

        }

      }

    #---------------------------------------------------------------------------

      if("statistics" %in% category) {

        if(!is.null(results_json$Statistics)) {

          if(isTRUE(detailed)){

            df_stats <- binding_lapply(results_json$Statistics,
                                       characteristics = c("Code",
                                                           "Content",
                                                           "Information",
                                                           "Cubes"))

            df_stats$Spezifisch <- ggsub(df_stats)

            df_stats$Variablen <- spezifisch_create(df_stats)

            df_stats$Object_Type <- "statistic"


            if (nrow(df_stats) != 0) {

              df_stats$Titel <- titel_search(df_stats, term, text = verbose)

            }


            if (isTRUE(ordering)) {

              df_stats <- df_stats[with(df_stats, order(-Titel, -Variablen)), c( "Code",
                                                                                 "Content",
                                                                                 "Titel",
                                                                                 "Information",
                                                                                 "Cubes",
                                                                                 "Variablen",
                                                                                 "Spezifisch",
                                                                                 "Object_Type")]

            } else {

              df_stats <- df_stats[, c("Code",
                                       "Content",
                                       "Titel",
                                       "Information",
                                       "Cubes",
                                       "Variablen",
                                       "Spezifisch",
                                       "Object_Type")]

            }


          } else if (isFALSE(detailed)) {

            df_stats <- binding_lapply(results_json$Statistics,
                                       characteristics = c("Code",
                                                           "Content"))

            df_stats$Spezifisch <- ggsub(df_stats)

            df_stats$Variablen <- spezifisch_create(df_stats)

            df_stats$Object_Type <- "statistic"


            if (nrow(df_stats) != 0) {

              df_stats$Titel <- titel_search(df_stats, term, text = verbose)

            }


            if (isTRUE(ordering)) {

              df_stats <- df_stats[with(df_stats, order(-Titel, -Variablen)), c( "Code",
                                                                                 "Content",
                                                                                 "Object_Type")]
            } else {

              df_stats <- df_stats[, c("Code",
                                       "Content",
                                       "Object_Type")]

            }

          }

        } else {

          df_stats <- find_token(results_json$Statistics,
                                 error.input = error.ignore,
                                 text = verbose,
                                 sub_category = "Statistics")

        }

      }

    #---------------------------------------------------------------------------

      if("variables" %in% category) {

        if(!is.null(results_json$Variables)) {

          if(isTRUE(detailed)){

            df_variables <- binding_lapply(results_json$Variables,
                                           characteristics = c("Code",
                                                               "Content",
                                                               "Type",
                                                               "Values",
                                                               "Information"))

            df_variables$Spezifisch <- ggsub(df_variables)

            df_variables$Variablen <- spezifisch_create(df_variables)

            df_variables$Object_Type <- "variable"


            if (nrow(df_variables) != 0) {

              df_variables$Titel <- titel_search(df_variables, term, text = verbose)

            }


            if (isTRUE(ordering)) {

              df_variables <- df_variables[with(df_variables, order(-Titel, -Variablen)), c( "Code",
                                                                                             "Content",
                                                                                             "Titel",
                                                                                             "Values",
                                                                                             "Information",
                                                                                             "Variablen",
                                                                                             "Spezifisch",
                                                                                             "Object_Type")]

            } else {

              df_variables <- df_variables[, c("Code",
                                               "Content",
                                               "Titel",
                                               "Values",
                                               "Information",
                                               "Variablen",
                                               "Spezifisch",
                                               "Object_Type")]

            }


          } else if (isFALSE(detailed)) {

            df_variables <- binding_lapply(results_json$Variables,
                                           characteristics = c("Code",
                                                               "Content"))

            df_variables$Spezifisch <- ggsub(df_variables)

            df_variables$Variablen <- spezifisch_create(df_variables)

            df_variables$Object_Type <- "variable"


            if (nrow(df_variables) != 0) {

              df_variables$Titel <- titel_search(df_variables, term, text = verbose)

            }


            if (isTRUE(ordering)) {

              df_variables <- df_variables[with(df_variables, order(-Titel, -Variablen)), c( "Code",
                                                                                             "Content",
                                                                                             "Object_Type")]

            } else {

              df_variables <- df_variables[, c("Code",
                                               "Content",
                                               "Object_Type")]

            }

          }

        } else {

          df_variables <- find_token(results_json$Variables,
                                     error.input = error.ignore,
                                     text = verbose,
                                     sub_category = "Variables")

        }

      }

    #---------------------------------------------------------------------------

      if("cubes" %in% category) {

        if (db == "genesis" | db == "regio") {

          if(!is.null(results_json$Cubes)) {

            if(isTRUE(detailed)){

              df_cubes <- binding_lapply(results_json$Cubes,
                                         characteristics = c("Code",
                                                             "Content",
                                                             "Time",
                                                             "LatestUpdate",
                                                             "State",
                                                             "Information"))

              df_cubes$Spezifisch <- ggsub(df_cubes)

              df_cubes$Variablen <- spezifisch_create(df_cubes)

              df_cubes$Object_Type <- "cube"


              if (nrow(df_cubes) != 0) {

                df_cubes$Titel <- titel_search(df_cubes, term, text = verbose)

              }


              if (isTRUE(ordering)) {

                df_cubes <- df_cubes[with(df_cubes, order(-Titel, -Variablen)), c( "Code",
                                                                                   "Content",
                                                                                   "Titel",
                                                                                   "Time",
                                                                                   "LatestUpdate",
                                                                                   "State",
                                                                                   "Information",
                                                                                   "Variablen",
                                                                                   "Spezifisch",
                                                                                   "Object_Type")]

              } else {

                df_cubes <- df_cubes[, c("Code",
                                         "Content",
                                         "Titel",
                                         "Time",
                                         "LatestUpdate",
                                         "State",
                                         "Information",
                                         "Variablen",
                                         "Spezifisch",
                                         "Object_Type")]

              }


            } else if (isFALSE(detailed)) {

              df_cubes <- binding_lapply(results_json$Cubes,
                                         characteristics = c("Code",
                                                             "Content"))

              df_cubes$Spezifisch <- ggsub(df_cubes)

              df_cubes$Variablen <- spezifisch_create(df_cubes)

              df_cubes$Object_Type <- "cube"


              if (nrow(df_cubes) != 0) {

                df_cubes$Titel <- titel_search(df_cubes, term, text = verbose)

              }


              if (isTRUE(ordering)) {

                df_cubes <- df_cubes[with(df_cubes, order(-Titel, -Variablen)), c( "Code",
                                                                                   "Content",
                                                                                   "Object_Type")]

              } else {

                df_cubes <- df_cubes[, c("Code",
                                         "Content",
                                         "Object_Type")]

              }

            }

          } else {

            df_cubes <- find_token(results_json$Cubes,
                                   error.input = error.ignore,
                                   text = verbose,
                                   sub_category = "Cubes")

          }


        } else if (db == "zensus") {

          df_cubes <- "There are generally no 'cubes' objects available for the 'zensus' database."

        }
      }

    #---------------------------------------------------------------------------

        list_resp <- list()

        if("tables" %in% category) {list_resp$Tables <- tibble::as_tibble(df_table) }
        if("statistics" %in% category) {list_resp$Statistics <- tibble::as_tibble(df_stats) }
        if("variables" %in% category) {list_resp$Variables <- tibble::as_tibble(df_variables) }
        if("cubes" %in% category) {list_resp$Cubes <- tibble::as_tibble(df_cubes) }

        attr(list_resp, "Term") <- results_json$Parameter$term
        attr(list_resp, "Database") <- db
        attr(list_resp, "Language") <- results_json$Parameter$language
        attr(list_resp, "Pagelength") <- results_json$Parameter$pagelength
        attr(list_resp, "Copyright") <- results_json$Copyright

        return(list_resp)

    }

  })

  res <- check_results(res)

  return(res)

}

Try the restatis package in your browser

Any scripts or data that you put into this service are public.

restatis documentation built on April 12, 2025, 1:28 a.m.