The survey Class
In dataset: Create Data Frames that are Easier to Exchange and Reuse

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(dataset)
library(declared)

You need the latest development version of declared.

remotes::install_github('dusadrian/declared')

The survey class will be derived from the dataset class.

obs_id <- c("Saschia Iemand", "Jane Doe", "Jack Doe", "Pim Iemand", "Matti Virtanen" )
sex <- declared ( c(1,1,0,-1,1), 
                  labels = c(Male = 0, Female = 1, DK = -1), 
                  na_values = -1)
geo <- c("NL-ZH", "IE-05", "GB-NIR", "NL-ZH", "FI1C")

difficulty_bills <- declared (
  c(0,1,2,-1,0), 
  labels = c(Never = 0, Time_to_time = 1, Always = 2, DK = -1)
  )
age_exact <- declared (
  c( 34,45,21,55,-1), 
  labels = c( A = 34,A = 45,A  = 21, A= 55, DK = -1)
)
listen_spotify <- declared (
  c(0,1,9,0,1),
  labels = c( No = 0, Yes = 1,Inap = 9), 
  na_values = 9
)

raw_survey <- data.frame ( 
  obs_id = obs_id, 
  geo = geo, 
  listen_spotify = listen_spotify,
  sex = sex,
  age_exact = age_exact, 
  difficulty_bills = difficulty_bills
)

survey_dataset  <- dataset( x= raw_survey,
                            Dimensions = "geo", 
                            Measures = c("listen_spotify", "sex",
                                         "age_exact", "difficulty_bills"), 
                            Attributes = NULL, 
                            sdmx_attributes = "geo", 
                            Title = "Tiny Survey", 
                            Creator = person("Jane", "Doe"))

dublincore(survey_dataset)

It is a good practice to define valid, but not present labels in declared, because in the retrospective harmonization workflow they may be concatenated (binded) together with further observations that do have the currently not used label.

In this example, the DK or declined label is not in use.

# This is not valied in declared
listen_spotify <- declared(
  c(0,1,9,0,1),
  labels = c( No = 0, Yes = 1,Inap = 9, DK =-1), 
  na_values = c(9, -1)
  )

print(listen_spotify)

c(listen_spotify, declared(
  c(-1,-1,-1),
  labels = c( No = 0, Yes = 1,Inap = 9, DK =-1)
  ))

summary(listen_spotify)

survey_dataset <- dublincore_add(survey_dataset, 
                               Title = "Tiny Survey", 
                               Creator = person("Daniel", "Antal"), 
                               Identifier = "https://doi.org/xxxx.yyyyy",
                               Publisher = "Reprex", 
                               Date = 2022, 
                               Subject = "Surveys", 
                               Language = "en")

The survey class inherits elements of the dataset class, but it will be more strictly defined. I am considering to make declared every single column except for the obs_id. Even numeric types with Inap and DK would map nicely to CL_OBS_STATUS SDMX codes that make missing observation explicit, and try to categorize them.

dublincore(survey_dataset)

Is the summary method implemented for declared? Both dataset and survey will need new print and summary methods.

summary(survey_dataset)

The survey (should) contain the entire processing history from creation, and optionally the DataCite schema for publication created with datacite_add(). A similar dublincore_add function uses the Dublin Core metadata definitions.

Eventually, a connection to the packages zen4R will make sure that the correctly described dataset can get a Zenodo record, receive a DOI, the DOI recorded in the object, and upload to Zenodo.