R/accessdb.R
In samadb: South Africa Macroeconomic Database API

Documented in sm_as_date sm_data sm_datasets sm_datasources sm_expand_date sm_pivot_longer sm_pivot_wider sm_series sm_transpose sm_write_excel

#' Global Identifier Macros
#'
#' @description
#' The macro \code{.SAMADB_ID} contains the string \code{c("dsid", "series")} denoting
#' the names of ID variables that identify the cross-sectional dimension in the database. All series codes are unique across datasets.
#'
#' The macro \code{.SAMADB_T} contains the string
#' \code{c("date", "year", "quarter", "month", "day")} denoting temporal identifiers generated by \code{\link{sm_expand_date}}.
#' The \code{"date"} variable is sufficient to uniquely identify a point in time in the database.
#'
#' Each value in the database is uniquely identified by dsid, series and date.
#'
#' @usage
#' .SAMADB_ID
#' .SAMADB_T
#'
#' @examples
#' .SAMADB_ID
#' .SAMADB_T
#'
#'
#' @seealso \code{\link{samadb}}
#'
#' @format NULL
#' @aliases .SAMADB_ID .SAMADB_T
#' @export .SAMADB_ID
#' @export .SAMADB_T

.SAMADB_ID <- c("dsid", "series")
.SAMADB_T <- c("date", "year", "quarter", "month", "day")


# Reconnect to Database
#
# This function terminates an existing connection to the database server and attempts to reconnect to it.
sm_reconnect <- function() {
  if(length(.sm_con)) tryCatch(dbDisconnect(.sm_con), error = function(e) cat(""))
  assignInMyNamespace(".sm_con", .connect())
}

.sm_con <- NULL # Needed for initialization

safe_query <- function(query) {
  if(is.null(.sm_con)) sm_reconnect() # Initial connection
  tryCatch(dbGetQuery(.sm_con, query), error = function(e) {
    sm_reconnect()
    tryCatch(dbGetQuery(.sm_con, query), error = function(e) NULL)
  })
}

#' Retrieve Data Sources Table
#'
#' This function pulls and returns a table called 'DATASOURCE' from the database.
#'
#' The 'DATASOURCE' table gives information about the sources of data in this database, including the source website, and the number of datasets available from the source.
#' @param ordered logical. \code{TRUE} orders the result in the order data was entered into the database, while \code{FALSE} returns the result in a random order, to the benefit of faster query execution.
#' @return A \code{\link[data.table]{data.table}} with information about the sources of data in the database.
#' @examples \donttest{
#' sm_datasources()
#' }
#' @seealso \code{\link{sm_datasets}}, \code{\link{sm_series}}, \code{\link{samadb}}
#'
#' @export sm_datasources

sm_datasources <- function(ordered = TRUE) {
  query <- "SELECT * FROM DATASOURCE"
  if(ordered) query <- paste(query, "ORDER BY src_order")
  res <- safe_query(query)
  if(is.null(res) || !fnrow(res)) {
    message("Query resulted in empty dataset. This means something is wrong with your internet connection, the connection to the database or with the database itself.")
    return(res)
  }
  res[["src_order"]] <- NULL
  return(qDT(res))
}


#' Retrieve Datasets Table
#'
#' This function pulls and return a table called 'DATASET' from the database.
#'
#' The 'DATASET' table gives information about the different datasets fetched from different providers at regular intervals.
#' It provides a unique id for each dataset, the frequency of data, the number of records (datapoints) in each dataset,
#' the minimum and maximum time coverage, when the dataset was last updated, and information about the data source, provider,
#' and method of data access.
#' @param ordered logical. \code{TRUE} orders the result in the order data was entered into the database, while \code{FALSE} returns the result in a random order, to the benefit of faster query execution.
#' @return A \code{\link[data.table]{data.table}} with information about the available datasets in the database.
#' @examples \donttest{
#' sm_datasets()
#' }
#' @seealso \code{\link{sm_datasources}}, \code{\link{sm_series}}, \code{\link{samadb}}
#'
#' @export sm_datasets

sm_datasets <- function(ordered = TRUE) {
  query <- "SELECT * FROM DATASET"
  if(ordered) query <- paste(query, "ORDER BY ds_order")
  res <- safe_query(query)
  if(is.null(res) || !fnrow(res)) {
    message("Query resulted in empty dataset. This means something is wrong with your internet connection, the connection to the database or with the database itself.")
    return(res)
  }
  res <- ftransformv(res, c("data_from", "data_to", "updated"), as.Date)
  res[["ds_order"]] <- NULL
  return(qDT(res))
}



#' Retrieve Series Table
#'
#' This function pulls the 'SERIES' table from the database, providing information about the time series in the database. Each series is given a code which unique across datasets.
#'
#' @param dsid character. (Optional) id's of datasources matching the 'dsid' column of the 'DATASET' table (retrieved using \code{\link[=sm_datasets]{sm_datasets()}}) for which series information is to be returned.
#' @param series character. (Optional) codes of series for which information in to be returned. If 'dsid' is also specificed, the two are combined using SQL 'OR' i.e. these series are retrieved in addition to all series matched through 'dsid'.
#' @param dataset.info logical. \code{TRUE} returns additional information from the 'DATASET' table (the dataset name, when data was last updated, the source id and the data provider and access mode).
#' @param ordered logical. \code{TRUE} returns the series in a fixed order, while \code{FALSE} returns the result in a random order, to the benefit of faster query execution.
#' @param return.query logical. \code{TRUE} will not query the database but instead return the constructed SQL query as a character string (for debugging purposes).
#'
#' @details Each series is given a code which is unique across datasets.
#' Each series also has a label describing the series. Further information recorded are the series frequency, unit, whether it was seasonally adjusted, number of observations,
#' minimum and maximum date, and (optionally) topic, alternative code provided by the data source (data retrieved from EconData uses EconData codes as series codes, so the 'src_code'
#' field gives the codes used by the SARB or STATSSA), or further comments on the series.
#'
#' @return A \code{\link[data.table]{data.table}} with information about the available series in the database.
#' @examples \donttest{
#' # By default returns all series
#' sm_series()
#'
#' # Adding information about the dataset and provider
#' sm_series(dataset.info = TRUE)
#'
#' # Only series in the QB
#' sm_series("QB")
#' }
#' @seealso \code{\link{sm_datasources}}, \code{\link{sm_datasets}}, \code{\link{sm_data}}, \code{\link{samadb}}
#'
#' @export sm_series

sm_series <- function(dsid = NULL, series = NULL, dataset.info = FALSE, ordered = TRUE, return.query = FALSE) {
  if(dataset.info) { # TODO: Works ??
    query <- "SELECT * FROM SERIES NATURAL JOIN (SELECT dsid, dataset, updated, srcid, src_dsid, provider, access FROM DATASET) AS DS"
  } else {
    query <- "SELECT * FROM SERIES"
  }
  if(length(dsid)) {
    query <- if(length(dsid) == 1L) paste0(query, " WHERE dsid = '", dsid, "'") else
            paste0(query, " WHERE dsid IN ('", paste(dsid, collapse = "', '"), "')")
  }
  if(length(series)) {
    add <- if(length(dsid)) " OR " else " WHERE "
    query <- if(length(series) == 1L) paste0(query, add, "series = '", series, "'") else
      paste0(query, add, "series IN ('", paste(series, collapse = "', '"), "')")
  }
  if(ordered) query <- paste(query, "ORDER BY s_order")

  if(return.query) return(query)
  res <- safe_query(query)
  if(is.null(res) || !fnrow(res)) {
    message("Query resulted in empty dataset. Please make sure the dsid exists by checking against the sm_datasets() table. Alternatively check your connection to the database.")
    return(res)
  }
  res$seas_adj <- as.logical(res$seas_adj)
  res <- ftransformv(res, c("from_date", "to_date", if(dataset.info) "updated"), as.Date)
  res[["s_order"]] <- NULL
  return(qDT(res))
}


#' Generate Temporal Identifiers from a Date Column
#'
#' This function expands a date column and generates additional temporal identifiers from it (year, month, quarter, day).
#'
#' @param x either a vector of class 'Date', or coercible to date using \code{\link[base]{as.Date}}, or a data frame / list containing with a date-column called \code{name}.
#' @param gen character. A vector of identifiers to generate from \code{x}. The possible identifiers are found in \code{\link{.SAMADB_T}}.
#' @param origin character / Date. Passed to \code{\link[base]{as.Date}}: for converting numeric \code{x} to date. The default reflects converting date-numbers from Excel for Windows.
#' @param keep.date logical. \code{TRUE} will keep the date variable in the resulting dataset, \code{FALSE} will remove the date variable in favor of the generated identifiers.
#' @param remove.missing.date logical. \code{TRUE} will remove missing values in \code{x}. If \code{x} is a dataset, rows missing the date variable will be removed.
#' @param sort logical. \code{TRUE} will sort the data by the date column.
#' @param as.factor \code{TRUE} will generate quarters and months as factor variables. It is also possible to use \code{as.factor = "ordered"} to generate ordered factors. \code{FALSE} will generate quarters and months as integer variables.
#' @param name character. The name of the date variable to expand.
#' @param \dots not used.
#'
#' @return A \code{\link[data.table]{data.table}} containing the computed identifiers as columns. See Examples.
#'
#' @examples
#' # First a basic example
#' x <- seq.Date(as.Date("1999-01-01"), as.Date("2000-01-01"), by = "month")
#' sm_expand_date(x)
#' sm_expand_date(x, gen = .SAMADB_T[-1L], keep.date = FALSE)
#' \dontrun{
#' # Now using the API
#' sm_expand_date(sm_data("BUSINESS_CYCLES"))
#'
#' # Same thing
#' sm_data("BUSINESS_CYCLES", expand.date = TRUE)
#' }
#'
#' @seealso \code{\link{sm_as_date}}, \code{\link{samadb}}
#' @export sm_expand_date

sm_expand_date <- function(x, gen = c("year", "quarter", "month"), origin = "1899-12-30",
                        keep.date = TRUE, remove.missing.date = TRUE, sort = TRUE,
                        as.factor = TRUE, name = "date", ...) {
  lxl <- FALSE
  genopts <- c('year', 'quarter', 'month', 'day')
  genlab <- c('Year', 'Quarter', 'Month', 'Day')
  ordered <- is.character(as.factor) && as.factor == "ordered"
  as.factor <- ordered || isTRUE(as.factor)
  if(is.numeric(gen)) {
    lab <- genlab[gen]
    gen <- genopts[gen]
  } else if(is.character(gen)) {
    lab <- genlab[ckmatch(gen, genopts, "Unknown gen option:")]
  } else stop("gen must be integer or character")

  if(is.list(x)) {
    lxl <- TRUE
    ind <- ckmatch(name, names(x), "No column named:")
    l <- .subset(x, -ind)
    x <- .subset2(x, ind)
  }
  getQ <- function(m) {
    Q <- ceiling(m / 3L)
    if(as.factor) structure(Q, levels = paste0("Q", 1:4), class = c(if(ordered) "ordered", "factor")) else Q # , "na.included"
  }
  if(!inherits(x, "Date"))
    x <- if(is.numeric(x)) as.Date(x, origin) else as.Date(x)
  if(remove.missing.date && anyNA(x)) {
    nna <- whichNA(x, invert = TRUE)
    x <- x[nna]
    if(lxl) l <- ss(l, nna)
  }
  if(!lxl && sort && is.unsorted(x)) x <- sort(x)
  xp <- as.POSIXlt(x)
  y <- xp$year + 1900L
  m <- xp$mon + 1L
  res <- lapply(setNames(as.vector(gen, "list"), gen), switch,
                year = y,
                quarter = getQ(m),
                month = if(as.factor) structure(m, levels = month.name, class = c(if(ordered) "ordered", "factor")) else m, # , "na.included"
                day = xp$mday,
                stop("Unknown gen option"))
  vlabels(res) <- lab
  if(keep.date) res <- c(list(date = `attr<-`(x, "label", "Date")), res)
  if(lxl) {
    res <- c(res, l)
    if(any(wc <- names(res) == "ISO3")) res <- c(res["ISO3"], res[!wc])
    if(sort && is.unsorted(x))
      res <- ss(res, order(x, method = "radix"))
  }
  return(qDT(res)) # need qDT here, res is a plain list...
}


#' Coerce Vectors to Dates
#'
#' This function coerces date strings i.e. \code{"YYYY-MM-DD"} or \code{"YYYY-MM"}, years e.g. \code{2015} (numeric or character),
#' year-quarters e.g. \code{"2015Q1"} or \code{"2015-Q1"}, year-months e.g. \code{"2015M01"} or \code{"2015-M01"} or numeric values representing dates (e.g. previously imported Excel date) to a regular R date.
#'
#' @param x a character date string \code{"YYYY-MM-DD"} or \code{"YYYY-MM"}, year-quarter \code{"YYYYQN"} or \code{"YYYY-QN"}, year-month \code{"YYYYMNN"} or \code{"YYYY-MNN"} or calendar year \code{YYYY} (numeric or character), or a numeric value corresponding to a date that can be passed to \code{\link[base]{as.Date.numeric}}.
#' @param end logical. \code{TRUE} replaces missing time information with a period end-date which is the last day of the period. \code{FALSE} replaces missing month and day information with \code{"-01"},
#' so the year date is the 1st of January, and for months / quarters the 1st day of the month / quarter.
#' @param origin a date or date-string that can be used as reference for converting numeric values to dates. The default corresponds to dates generated in Excel for Windows. See \code{\link[base]{as.Date.numeric}}.
#'
#' @return A \code{\link[base]{Date}} vector.
#' @examples
#' sm_as_date("2011-05")
#' sm_as_date(2011)
#' sm_as_date("2011Q1")
#' sm_as_date("2011Q1", end = TRUE)
#' sm_as_date("2011M2")
#' sm_as_date("2011M2", end = TRUE)
#'
#' @seealso \code{\link{sm_expand_date}}, \code{\link{samadb}}
#' @export sm_as_date

sm_as_date <- function(x, end = FALSE, origin = "1899-12-30") {
  x1 <- x[1L]
  ncx <- nchar(x1)
  if(ncx == 4L) return(as.Date(paste0(x, if(end) "-12-31" else "-01-01")))
  if(is.numeric(x)) return(as.Date.numeric(x, origin))
  if(ncx >= 6L && ncx <= 8L) { # could be "1999/1"
    s5 <- substr(x1, 5L, 5L)
    # if(s5 == "/") {
    #   x <- if(end) paste0(as.integer(substr(x, 1L, 4L)) + 1L, "-06-30") else
    #                paste0(substr(x, 1L, 4L), "-07-01")
    # } else
    if(s5 == "Q" || substr(x1, 6L, 6L) == "Q") {
      Q <- as.integer(substr(x, ncx, ncx))
      x <- if(end) paste0(substr(x, 1L, 4L), "-", Q * 3L, fifelse(as.logical(Q %% 2L), "-31", "-30")) else # fifelse requires logical argument
                   paste0(substr(x, 1L, 4L), "-", Q * 3L - 2L, "-01")
    } else {
      if(s5 == "M" || substr(x1, 6L, 6L) == "M") {
        st <- if(s5 == "M") 6L else 7L
        M <- substr(x, st, st + 1L)
        x <- paste0(substr(x, 1L, 4L), "-", M, "-01")
      } else { # Assuming now Year-Month type string
        if(ncx != 8L) x <- paste0(x, "-01")
      }
      if(end) return(as.Date(format(as.Date(x) + 31L, "%Y-%m-01")) - 1L)
    }
  }
  return(as.Date(x))
}


#' Retrieve Data from the Database
#'
#' This is the main function of the package to retrieve data from the database.
#'
#' @param dsid character. (Optional) id's of datasets matching the 'dsid' column of the 'DATASET' table (retrieved using \code{\link[=sm_datasets]{sm_datasets()}}). If used, all series from the dataset are returned, in addition to any other series selected with \code{series}.
#' @param series character. (Optional) codes of series matching the 'series' column of the 'SERIES' table (retrieved using \code{\link[=sm_series]{sm_series()}}). If 'dsid' is also specificed, the two are combined using SQL 'OR' i.e. these series are retrieved in addition to all series matched through 'dsid'.
#' @param from set the start time of the data retrieved by either supplying a start date, a date-string of the form \code{"YYYY-MM-DD"} or \code{"YYYY-MM"},
#' year-quarters of the form \code{"YYYYQN"} or \code{"YYYY-QN"}, or a numeric year \code{YYYY} (numeric or character). These expressions are converted to a regular date by \code{\link{sm_as_date}}.
#' @param to same as \code{from}: to set the time period until which data is retrieved. For expressions that are not full "YYYY-MM-DD" dates, the last day of the period is chosen.
#' @param freq character. Return only series at a certain frequency. Allowed are values \code{"D"} (Daily), \code{"W"} (Weekly), \code{"M"} (Monthly), \code{"Q"} (Quarterly), \code{"A"} (Annual), \code{"AF"} (Fiscal Years), matching the 'freq' column in the 'SERIES' table (retrieved using \code{\link[=sm_series]{sm_series()}}).
#' @param labels logical. \code{TRUE} will also return labels (series descriptions) along with the series codes.
#' @param wide logical. \code{TRUE} calls \code{\link{sm_pivot_wider}} on the result. \code{FALSE} returns the data in a long format without missing values (suitable for \code{ggplot2}).
#' @param expand.date logical. \code{TRUE} will call \code{\link{sm_expand_date}} on the result.
#' @param ordered logical. \code{TRUE} orders the result by 'date' and, if \code{!is.null(dsid)}, \code{labels = TRUE} or \code{!is.null(freq)}, by series, maintaining a fixed order of series.
#' \code{FALSE} does not explicitly order the result, to the benefit of faster query execution.
#' @param return.query logical. \code{TRUE} will not query the database but instead return the constructed SQL query as a character string (for debugging purposes).
#' @param \dots further arguments passed to \code{\link{sm_pivot_wider}} (if \code{wide = TRUE}) or \code{\link{sm_expand_date}} (if \code{expand.date = TRUE}), no conflicts between these two.
#'
#' @details Series from datasets at different frequencies can be queried, but, if \code{wide = TRUE}, this will result in missing values in the lower frequency series.
#'
#' @return A \code{\link[data.table]{data.table}} with the result of the query.
#'
#' @examples \donttest{
#' # Return all electricity indicators from 2000
#' sm_data("ELECTRICITY", from = 2000)
#'
#' }
#' @seealso \code{\link{sm_pivot_wider}}, \code{\link{sm_expand_date}}, \code{\link{samadb}}
#' @export sm_data
#'

sm_data <- function(dsid = NULL,
                    series = NULL, from = NULL, to = NULL, freq = NULL,
                    labels = TRUE, wide = TRUE, expand.date = FALSE,
                    ordered = TRUE, return.query = FALSE, ...) {

  d0 <- is.null(dsid)
  s0 <- is.null(series)
  if(!d0 && anyDuplicated(dsid)) stop("duplicated dataset id: ", paste(dsid[duplicated(dsid)], collapse = ", "))
  if(!s0 && anyDuplicated(series)) stop("duplicated series: ", paste(series[duplicated(series)], collapse = ", "))
  # Join series whenever necessary or for returning the full dataset in long format
  series_joined <- !d0 || (d0 && s0 && !wide) || labels || length(freq)
  if(series_joined) {
     data <- "DATA NATURAL JOIN SERIES"
     lab <- if(labels) ", label, unit, seas_adj" else ""
  } else {
    data <- "DATA"
    lab <- ""
  }
  # Cases:
  # 0 dsid and series: return full -> checked
  # 1 dsid and 0 series: return only series codes -> checked
  # 0 dsid and some series: return series codes and dsid if series_joined -> checked
  # 1 dsid and some series: return dsid and series codes -> checked
  # multiple dsid: return dsid and series codes -> checked
  # Also: only need dsid if wide = FALSE
  cond <- if(d0) "" else if(length(dsid) == 1L) paste0("dsid = '", dsid, "'") else paste0("dsid IN ('", paste(dsid, collapse = "', '"), "')")
  vars <- paste0(if(!wide && (length(dsid) > 1L || !(d0 || s0) || (d0 && series_joined))) "date, dsid, series" else "date, series", lab, ", value")
  if(ordered) ord <- if(series_joined) "s_order, date" else "series, date"  # Assumes s_order includes dsid!!

  if(!s0) {
    if(d0) {
      cond <- if(length(series) == 1L) paste0("series = '", series, "'") else
        paste0("series IN ('", paste(series, collapse = "', '"), "')")
    } else {
      cond <- if(length(series) == 1L) paste0("(", cond, " OR series = '", series, "')") else
        paste0("(", cond, " OR series IN ('", paste(series, collapse = "', '"), "'))")
    }
  }

  if(length(from)) cond <- paste0(cond, " AND date >= '", sm_as_date(from) , "'")
  if(length(to))   cond <- paste0(cond, " AND date <= '", sm_as_date(to, end = TRUE) , "'")
  if(length(freq)) cond <- paste0(cond, " AND freq = '", toupper(freq), "'")

  where <- if(cond == "") "" else "WHERE"
  if(d0 && s0 && cond != "") cond <- substr(cond, 6L, 100000L)

  query <- if(ordered) paste("SELECT", vars, "FROM", data, where, cond, "ORDER BY", ord) else
                       paste("SELECT", vars, "FROM", data, where, cond)
  if(return.query) return(query)
  res <- safe_query(query)
  if(is.null(res) || !fnrow(res)) {
    message("Query resulted in empty dataset. Please make sure that the dsid, series-codes or the date-range supplied in your query are consistent with the available data. See sm_datasets() and sm_series(). Alternatively check your connection to the database.")
    return(res)
  }

  res$date <- as.Date(res$date)
  if(labels) {
    res$label <- with(res, paste0(label, " (", unit, fifelse(seas_adj & !label %ilike% "seasonally", ", Seasonally Adjusted)", ")")))
    get_vars(res, c("unit", "seas_adj")) <- NULL
  }

  setDT(res)
  if(wide) {
    res <- sm_pivot_wider(res, ...) # Could optimize to omit ORDER BY clause if wide and length(series) > 1L
    if(d0 && length(series) > 1L) setcolorder(res, c("date", series))
  }
  if(expand.date) return(sm_expand_date(res, ...)) else return(res)
}


#' Reshape Long API Data to Column-Based Format
#'
#' This function automatically reshapes long (stacked) raw data from the API (\code{\link[=sm_data]{sm_data(..., wide = FALSE)}}) to a wide format where each variable has its own column.
#' Internally it uses \code{\link[collapse]{pivot}} from \emph{collapse}.
#'
#' @param data raw data from the API: A long format data frame where all values are stacked in a value column.
#' @param id_cols character. Temporal identifiers of the data. By default all variables in \code{\link{.SAMADB_T}} are selected.
#' @param names_from character. The column containing the series codes. These will become the names of the new columns in the wider data format.
#' @param values_from character. The column containing the data values.
#' @param labels_from character. The column containing the labels describing the series.
#' @param expand.date logical. \code{TRUE} will call \code{\link{sm_expand_date}} on the data after reshaping.
#' @param \dots further arguments passed to \code{\link[collapse]{pivot}} or \code{\link{sm_expand_date}}.
#'
#' @return A \code{\link[data.table]{data.table}} with the reshaped data.
#'
#' @examples \dontrun{
#' # Return all electricity indicators from the year 2000 onwards
#' sm_pivot_wider(sm_data("ELECTRICITY", from = 2000, wide = FALSE))
#' }
#' @seealso \code{\link{sm_pivot_longer}}, \code{\link{samadb}}
#'
#' @export sm_pivot_wider
#'

sm_pivot_wider <- function(data, id_cols = intersect(.SAMADB_T, names(data)),
                           names_from = "series", values_from = "value",
                           labels_from = if(any(names(data) == "label")) "label" else NULL,
                           expand.date = FALSE, ...) {
  res <- pivot_ellipsis(data, id_cols, values_from, names_from, labels_from, how = "wider", sort = "ids", ...) # collapse::pivot
  if(expand.date) return(sm_expand_date(res, ...)) else return(res)
}

# This is to allow ellipsis (...) arguments to be passed to both pivot and am_expand_data
pivot_ellipsis <- function(data, ids = NULL, values = NULL, names = NULL, labels = NULL,
                           how = "longer", na.rm = FALSE, factor = c("names", "labels"),
                           check.dups = FALSE, nthreads = 1L, fill = NULL,
                           drop = TRUE, sort = FALSE, transpose = FALSE, ...) {
  pivot(data, ids = ids, values = values, names = names,
        labels = labels, how = how, na.rm = na.rm, factor = factor,
        check.dups = check.dups, nthreads = nthreads, fill = fill,
        drop = drop, sort = sort, transpose = transpose)
}



#' Reshape Column-Based Data to Long Format
#'
#' This function automatically reshapes wide (column-based) data into a long format akin to the format of the raw data coming from the database (\code{\link[=sm_data]{sm_data(..., wide = FALSE)}}).
#' Internally it uses \code{\link[data.table]{melt}} from \emph{data.table}.
#'
#' @param data a wide format data frame where all series have their own column.
#' @param id_cols character. Temporal identifiers of the data. By default all variables in \code{\link{.SAMADB_T}} are selected.
#' @param to_value character. The names of all series to be stacked into the long format data frame.
#' @param variable_name character. The name of the variable to store the names of the series.
#' @param value_name character. The name of the variable to store the data values.
#' @param label_name character. The name of the variable to store the series labels.
#' @param na.rm logical. \code{TRUE} will remove all missing values from the long data frame.
#' @param variable.factor,label.factor logical. \code{TRUE} will code the "series" and "label" columns as factors, which is more memory efficient.
#' @param \dots further arguments passed to \code{\link[data.table]{melt}}.
#'
#' @return A \code{\link[data.table]{data.table}} with the reshaped data.
#'
#' @examples \dontrun{
#' # Return all electricity indicators from the year 2000 onwards
#' data <- sm_data("ELECTRICITY", from = 2000)
#' sm_pivot_longer(data)
#' }
#' @seealso \code{\link{sm_pivot_wider}}, \code{\link{samadb}}
#'
#' @export sm_pivot_longer
#'

sm_pivot_longer <- function(data, id_cols = intersect(.SAMADB_T, names(data)),
                      to_value = setdiff(names(data), id_cols),
                      variable_name = "series", value_name = "value",
                      label_name = "label", na.rm = TRUE, variable.factor = TRUE,
                      label.factor = TRUE, ...) {
  if(length(label_name)) labs <- vlabels(.subset(data, to_value), use.names = FALSE)
  res <- melt(qDT(data), id_cols, to_value, variable_name, value_name,
              na.rm = na.rm, variable.factor = variable.factor, ...)
  vlabels(res) <- NULL # needed
  if(length(label_name) && !allNA(labs)) {
    if(label.factor) {
      al <- list(levels = labs, class = "factor")
      labs <- if(na.rm) rep(seq_along(labs), fnobs(.subset(data, to_value))) else
                        rep(seq_along(labs), each = fnrow(data))
      attributes(labs) <- al
    } else {
      # melt always preserves column order (otherwise could rep with names and match series column)
      labs <- if(na.rm) rep(labs, fnobs(.subset(data, to_value))) else
                        rep(labs, each = fnrow(data))
    }
    add_vars(res, fncol(res)) <- setNames(list(labs), label_name)
  }
  return(res)
}


#' Transpose a Wide Dataset to a Row-Based Format
#'
#' This function is called by \code{\link{sm_write_excel}} with option \code{transpose = TRUE} to generate a row-based tabular data format from a wide data frame in R that is suitable for exporting to Excel.
#'
#' @param data a wide format data frame where each column is a variable and the first variable uniquely identifies the data.
#' @param date.format a format for date columns which is passed to \code{\link[base]{format.Date}}. When transposing wide, dates are converted to character. The default R YYYY-MM-DD format for dates is often not recognized by Excel. By default dates are transformed to DD/MM/YYYY format which Excel (UK English) recognizes. Putting \code{FALSE} here does not transform dates into another format.
#'
#' @return A transposed data frame or \code{\link[data.table]{data.table}} (the class of the input is preserved).
#'
#' @examples \dontrun{
#' sm_transpose(sm_data("ELECTRICITY"))
#' }
#' @seealso \code{\link[data.table]{transpose}}, \code{\link{sm_pivot_wider}}, \code{\link{sm_write_excel}}, \code{\link{samadb}}
#' @export sm_transpose

sm_transpose <- function(data, date.format = "%d/%m/%Y") {
  nnum <- cat_vars(data, "logical")[-1L]
  lab <- vlabels(data)[-1L]

  if(is.character(date.format) && length(dv <- date_vars(data, "indices"))) {
    get_vars(data, dv) <-  if(length(dv) == 1L) format(.subset2(data, dv), date.format) else
                           lapply(.subset(data, dv), format, date.format)
  }

  if(any(nnum) && !all(nnum)) {
    res <- list(`numeric data` = add_vars(transpose(get_vars(data, c(TRUE, !nnum)),
                                                    make.names = 1L, keep.names = "variable"),
                                          list(label = lab[!nnum]), pos = 2L),
                `non-numeric data` =  add_vars(transpose(get_vars(data, c(TRUE, nnum)),
                                                         make.names = 1L, keep.names = "variable"),
                                               list(label = lab[nnum]), pos = 2L))
  } else {
    res <- transpose(data, make.names = 1L, keep.names = "variable")
    add_vars(res, 2L) <- list(label = lab)
  }

  attr(res, "transposed") <- TRUE
  if(inherits(data, "data.table")) {
    if(inherits(res, "data.table")) return(qDT(res, keep.attr = TRUE))
    for(i in 1:2) setDT(res[[i]])
  }
  return(res)
}

#' Export Wide Data to Excel
#'
#' This function exports a wide format dataset to a column- (default) or row-oriented Excel format.
#'
#' @param data a wide dataset from \code{\link{sm_data}} or reshaped to a wide format with \code{\link{sm_pivot_wider}}.
#' @param \dots further arguments to \code{\link[writexl]{write_xlsx}}. As a minimum a path needs to be supplied that ends with the name of the Excel file. See Examples.
#' @param transpose logical. If \code{TRUE}, the result is returned in a row-oriented Excel format. The default is column oriented (same as the dataset in R).
#' @param transpose.date.format argument passed to \code{\link{sm_transpose}}, setting the format of date columns when data is transposed.
#'
#' @return Writes an Excel file to the specified path (no return value).
#'
#' @examples
#' \dontrun{
#' # Getting electricity indicators from 2000
#' data <- sm_data("ELECTRICITY", from = 2000)
#'
#' # Saving to different Excel formats
#' sm_write_excel(data, "ELECTRICITY.xlsx")
#' sm_write_excel(data, "ELECTRICITY.xlsx", transpose = TRUE)
#'
#' # Saving to alternative path
#' sm_write_excel(data, "C:/Users/.../ELECTRICITY.xlsx")
#' }
#'
#' @seealso \code{\link{sm_transpose}}, \code{\link[writexl]{write_xlsx}}, \code{\link{samadb}}
#'
#' @export sm_write_excel
#'

sm_write_excel <- function(data, ..., transpose = FALSE, transpose.date.format = "%d/%m/%Y") {
  ntpl <- !isTRUE(attr(data, "transposed"))
  if(transpose && ntpl) {
    res <- sm_transpose(data, transpose.date.format)
  } else if(ntpl) {
    res <- list(data = data, labels = namlab(data))
  } else {
    res <- data
  }
  if(missing(...)) stop("Need to at least provide path to file. See help(write_xlsx).")
  return(write_xlsx(res, ...))
}
Any scripts or data that you put into this service are public.
samadb documentation built on Aug. 21, 2025, 5:54 p.m.
rdrr.io home R language documentation Run R code online
CRAN packages Bioconductor packages R-Forge packages GitHub packages
Note that we can't provide technical support on individual packages. You should contact the package authors for that.
samadb
South Africa Macroeconomic Database API

R/accessdb.R
In samadb: South Africa Macroeconomic Database API

Defines functions sm_write_excel sm_transpose sm_pivot_longer pivot_ellipsis sm_pivot_wider sm_data sm_as_date sm_expand_date sm_series sm_datasets sm_datasources safe_query sm_reconnect

Documented in sm_as_date sm_data sm_datasets sm_datasources sm_expand_date sm_pivot_longer sm_pivot_wider sm_series sm_transpose sm_write_excel

Try the samadb package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

samadb South Africa Macroeconomic Database API

R/accessdb.R In samadb: South Africa Macroeconomic Database API

Defines functions sm_write_excel sm_transpose sm_pivot_longer pivot_ellipsis sm_pivot_wider sm_data sm_as_date sm_expand_date sm_series sm_datasets sm_datasources safe_query sm_reconnect

Documented in sm_as_date sm_data sm_datasets sm_datasources sm_expand_date sm_pivot_longer sm_pivot_wider sm_series sm_transpose sm_write_excel

Try the samadb package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

samadb
South Africa Macroeconomic Database API

R/accessdb.R
In samadb: South Africa Macroeconomic Database API