#' Download SDDF data by round for countries from the European Social Survey
#'
#' @param country a character of length 1 with the full name of the country.
#' Use \code{\link{show_countries}} for a list of available countries.
#'
#' @param rounds a numeric vector with the rounds to download. See \code{\link{show_sddf_cntrounds}}
#' for all available rounds for any given country.
#'
#' @param ess_email a character vector with your email, such as "your_email@email.com".
#' If you haven't registered in the ESS website, create an account at
#' \url{http://www.europeansocialsurvey.org/user/new}. A preferred method is to login
#' through \code{\link{set_email}}.
#'
#' @param output_dir a character vector with the output directory in case you want to
#' only download the files using \code{download_sddf_country}. Defaults to your working
#' directory. This will be interpreted as a \strong{directory} and not a path with
#' a file name.
#'
#' @param format the format from which to download the data. By default it is NULL for \code{import_*} functions and tries to read 'stata', 'spss' and 'sas' in the specific order. This can be useful if some countries don't have a particular format available. Alternatively, the user can specify the format which can either be 'stata', 'spss' or 'sas'.
#' For the \code{download_*} functions it is set to 'stata' because the format should be
#' specified before downloading. Setting it to \code{NULL} will iterate over 'stata',
#' 'spss' and 'sas' and download the first that is available. When using \code{import_country}
#' the data will be downloaded and read in the \code{format} specified. For \code{download_country},
#' the data is downloaded from the specified \code{format} (only 'spss' and 'stata' supported,
#' see details).
#'
#' @details
#'
#' SDDF data (Sample Design Data Files) are data sets that contain additional columns with the
#' sample design and weights for a given country in a given round. These additional columns are
#' required to perform any complex weighted analysis of the ESS data. Users interested in using this data
#' should read the description of SDDF files \href{http://www.europeansocialsurvey.org/methodology/ess_methodology/sampling.html}{here}
#' and should read \href{http://www.europeansocialsurvey.org/data/download_sample_data.html}{here} for the
#' sampling design of the country of analysis for that specific round.
#'
#' Use \code{import_sddf_country} to download the SDDF data by country into R.
#' \code{import_all_sddf_cntrounds} will download all available SDDF data for a given country by
#' default and \code{download_sddf_country} will download SDDF data and save them in a specified
#' \code{format} in the supplied directory.
#'
#' The \code{format} argument from \code{import_country} should not matter to the user
#' because the data is read into R either way. However, different formats might have
#' different handling of the encoding of some questions. This option was preserved
#' so that the user can switch between formats if any encoding errors are found in the data. For more
#' details see the discussion \href{https://github.com/ropensci/essurvey/issues/11}{here}.
#'
#' Additionally, given that the SDDF data is not very complete, some countries do not have SDDF data
#' in Stata or SPSS formats. For that reason, the \code{format} argument is not used in \code{import_sddf_country}.
#' Internally, \code{Stata} is chosen over \code{SPSS} and \code{SPSS} over \code{SAS} in that
#' order of preference.
#'
#' For this particular argument, 'sas' is not supported because the data formats have
#' changed between ESS waves and separate formats require different functions to be
#' read. To preserve parsimony and format errors between waves, the user should use
#' 'stata' or 'spss'.
#'
#' Starting from round 7 (including), the ESS switched the layout of SDDF data.
#' Before the rounds, SDDF data was published separately by wave-country
#' combination. From round 7 onwards, all SDDF data is released as a single
#' integrated file with all countries combined for that given round. \code{import_sddf_country}
#' takes care of this nuance by reading the data and filtering the chosen
#' country automatically. \code{download_sddf_country} downloads the raw file but also
#' reads the data into memory to subset the specific country requested. This
#' process should be transparent to the user but beware that reading/writing the data might delete
#' some of it's properties such as dropping the labels or label attribute.
#'
#' @return for \code{import_sddf_country} if \code{length(rounds)} is 1, it returns a tibble with
#' the latest version of that round. Otherwise it returns a list of \code{length(rounds)}
#' containing the latest version of each round. For \code{download_sddf_country}, if
#' \code{output_dir} is a valid directory, it returns the saved directories invisibly and saves
#' all the rounds in the chosen \code{format} in \code{output_dir}
#'
#' @export
#' @examples
#' \dontrun{
#'
#' set_email("your_email@email.com")
#'
#' sp_three <- import_sddf_country("Spain", 5:6)
#'
#' show_sddf_cntrounds("Spain")
#'
#' # Only download the files, this will return nothing
#'
#' temp_dir <- tempdir()
#'
#' download_sddf_country(
#' "Spain",
#' rounds = 5:6,
#' output_dir = temp_dir
#' )
#'
#' # By default, download_sddf_country downloads 'stata' files but
#' # you can also download 'spss' or 'sas' files.
#'
#' download_sddf_country(
#' "Spain",
#' rounds = 1:8,
#' output_dir = temp_dir,
#' format = 'spss'
#' )
#'
#' }
#'
import_sddf_country <- function(country,
rounds,
ess_email = NULL,
format = NULL) {
stopifnot(is.character(country), length(country) > 0)
stopifnot(is.numeric(rounds), length(rounds) > 0)
if (!is.null(format) && format == "sas") {
stop(
"You cannot read SAS but only 'spss' and 'stata' files with this function. See ?import_rounds for more details") # nolint
}
rounds <- sort(rounds)
late_rounds <- rounds > 6
if (all(late_rounds)) {
urls <- country_url_sddf_late_rounds(country, rounds[late_rounds], format)
} else if (all(!late_rounds)) {
urls <- country_url_sddf(country, rounds, format)
} else {
urls <- c(country_url_sddf(country, rounds[!late_rounds], format),
country_url_sddf_late_rounds(country, rounds[late_rounds], format)
)
}
# Iff SDDF already downloaded, exclude from URLs and don't download again
sddf_cached <- all(dir.exists(.global_vars$sddf_laterounds_dir))
sddf_ready <- any(late_rounds) && sddf_cached
if (sddf_ready) urls <- urls[!late_rounds]
dir_download_tmp <- download_format(country = country,
urls = urls,
ess_email = ess_email)
if (sddf_ready) {
message("SDDF Round(s) ",
paste0(rounds[late_rounds], collapse = ", "),
" reused from cache.")
rnds_dl <- string_extract(basename(.global_vars$sddf_laterounds_dir), "\\d")
pos_match <- match(rounds[late_rounds], rnds_dl)
dir_download <- c(dir_download_tmp,
.global_vars$sddf_laterounds_dir[pos_match])
} else {
dir_download <- dir_download_tmp
}
all_data <- read_sddf_data(dir_download, country)
# Remove only the non SDDF directories
unlink(dir_download_tmp, recursive = TRUE, force = TRUE)
if (length(dir_download) == 1) all_data <- all_data[[1]]
all_data
}
#' @rdname import_sddf_country
#' @export
import_all_sddf_cntrounds <- function(country,
ess_email = NULL,
format = NULL) {
import_sddf_country(
country,
show_sddf_cntrounds(country),
ess_email,
format = format
)
}
#' @rdname import_sddf_country
#' @export
download_sddf_country <- function(country,
rounds,
ess_email = NULL,
output_dir = getwd(),
format = "stata") {
stopifnot(is.character(country), length(country) > 0)
stopifnot(is.numeric(rounds), length(rounds) > 0)
if (!is.null(format) && format == "sas") {
stop(
"You cannot read SAS but only 'spss' and 'stata' files with this function. See ?download_sddf_country for more details") # nolint
}
rounds <- sort(rounds)
late_rounds <- rounds > 6
if (all(late_rounds)) {
urls <- country_url_sddf_late_rounds(country, rounds[late_rounds], format)
} else if (all(!late_rounds)) {
urls <- country_url_sddf(country, rounds, format)
} else {
urls <- c(country_url_sddf(country, rounds[!late_rounds], format),
country_url_sddf_late_rounds(country, rounds[late_rounds], format)
)
}
dir_download <-
download_format(urls = urls,
country = country,
ess_email = ess_email,
only_download = TRUE,
output_dir = output_dir)
# If there are any late rounds, we need to read them back to memory
# subset the specific country they requested and resave them back
# to the same directory they were read from
if (any(late_rounds)) {
all_data <-
suppress_all(
read_sddf_data(dir_download[late_rounds], country)
)
format_ext <- c(".dta", ".sav", ".por")
# Get all paths from the format
# This **should be** the same paths used to read the data because
# these exact same lines of code are used in read_format_data.
format_dirs <- list.files(dir_download[late_rounds],
pattern = paste0(format_ext, "$", collapse = "|"),
full.names = TRUE)
if (length(format_dirs) != length(all_data)) {
# If this bug happens it means that there are more files with
# stata/spss/sas extensions than data already read. This means
# that could save the data in an otherwise no valid file
# messing up the logic of which file hosts the subsetted data.
stop("This is an internal check for a possible bug with the ESS data.
Please file an issue at https://github.com/ropensci/essurvey/issues
with exact line of code that generated the error. For example:
download_country_sddf('Italy', 1:3)")
}
# Save only the .dta/.sav/.por files
for (.x in seq_along(format_dirs)) {
# Use function to read the specified format
format_save <-
switch(file_ext(format_dirs[[.x]]),
"dta" = haven::write_dta,
"por" = haven::write_sav,
"sav" = haven::write_sav
)
# .por files give a lot of trouble. Here we always read them with
# read/write_sav so we change the file extension to .sav to save them
# as .sav
format_save(all_data[[.x]], gsub(".por", ".sav", format_dirs[.x]))
}
}
invisible(dir_download)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.