R/fetch_genotypes.R

Defines functions fetch_genotypes

Documented in fetch_genotypes

#' Download openSNP genotype data for a user
#'
#' @export
#' @family opensnp-fxns
#' @param url (character) URL for the download. See example below of
#' function use.
#' @param rows (integer) Number of rows to read in. Useful for getting a
#' glimpse of  the data. Negative and other invalid values are ignored,
#' giving back all data.  Default: 100
#' @param filepath (character) If none is given the file is saved to a
#' temporary file,  which will be lost after your session is closed. Save
#' to a file if you want to  access it later.
#' @param quiet (logical) Should download progress be suppressed. Default:
#' `TRUE`
#' @param ... Further args passed on to [download.file()]
#' @return data.frame for a single user, with four columns:
#'
#' - rsid (character)
#' - chromosome (integer)
#' - position (integer)
#' - genotype (character)
#'
#' @details Beware, not setting the rows parameter means that you download
#' the entire file, which can be large (e.g., 15MB), and so take a while
#' to download depending on your connection speed. Therefore, rows is set to
#' 10 by default to sort of protect the user.
#'
#' Internally, we use [download.file()] to download each file, then
#' [read.table()] to read the file to a data.frame.
#'
#' @examples \dontrun{
#' # get a data.frame of the users data
#' data <- users(df = TRUE)
#' head(data[[1]]) # users with links to genome data
#' mydata <- fetch_genotypes(
#'   url = data[[1]][1, "genotypes.download_url"],
#'   file = "~/myfile.txt"
#' )
#'
#' # see some data right away
#' mydata
#'
#' # Or read in data later separately
#' read.table("~/myfile.txt", nrows = 10)
#' }
fetch_genotypes <- function(url, rows = 100, filepath = NULL, quiet = TRUE, ...) {
  if (is.null(filepath)) filepath <- tempfile(fileext = ".txt")
  
  tryCatch(
    {
      utils::download.file(url, destfile = filepath, quiet = quiet, ...)
    },
    error = function(e) {
      message("Failed to retrieve data. Please check the URL or try again later.")
      stop("No URL available")
    }
    ,
    warning = function(w) {
      message("Warning: Data retrieval resulted in a warning.")
      # Handle warnings if necessary
      return(NULL)
    }
  )
  df <- utils::read.table(filepath,
                          sep = "\t", nrows = rows, header = FALSE,
                          comment.char = "#", stringsAsFactors = FALSE
  )
  stats::setNames(df, c("rsid", "chromosome", "position", "genotype"))
}
ropensci/rsnps documentation built on July 31, 2023, 11:11 a.m.