R/data.R
In ds4psy: Data Science for Psychologists

## data.R | ds4psy
## hn | uni.kn | 2025 09 24
## Documentation of data sets included in /data. 


# (01) Positive Psychology data: ---------- 

# (01a) posPsy_p_info: ------ 

#' Positive Psychology: Participant data 
#'
#' \code{posPsy_p_info} is a dataset containing details of 295 participants. 
#' 
#' \describe{
#'   
#'   \item{id}{Participant ID.}
#'   
#'   \item{intervention}{Type of intervention: 
#'   3 positive psychology interventions (PPIs), plus 1 control condition: 
#'     1: "Using signature strengths", 
#'     2: "Three good things", 
#'     3: "Gratitude visit",  
#'     4: "Recording early memories" (control condition).}
#'     
#'   \item{sex}{Sex: 1 = female, 2 = male.}
#'   
#'   \item{age}{Age (in years).}
#'   
#'   \item{educ}{Education level: Scale from 1: less than 12 years, to 5: postgraduate degree.}
#'   
#'   \item{income}{Income: Scale from 1: below average, to 3: above average.} 
#'   
#' }
#' 
#' See codebook and references at \url{https://bookdown.org/hneth/ds4psy/B.1-datasets-pos.html}.
#'
#' @format A table with 295 cases (rows) and 6 variables (columns).
#' 
#' @family datasets
#' 
#' @source 
#' \strong{Articles}
#' 
#' \itemize{
#' 
#' \item Woodworth, R. J., O’Brien-Malone, A., Diamond, M. R., & Schüz, B. (2017). 
#' Web-based positive psychology interventions: A reexamination of effectiveness. 
#' \emph{Journal of Clinical Psychology}, \emph{73}(3), 218--232. 
#' doi: \code{10.1002/jclp.22328} 
#' 
#' \item Woodworth, R. J., O’Brien-Malone, A., Diamond, M. R. and Schüz, B. (2018). 
#' Data from, ‘Web-based positive psychology interventions: A reexamination of effectiveness’. 
#' \emph{Journal of Open Psychology Data}, \emph{6}(1). 
#' doi: \code{10.5334/jopd.35}
#' }
#' 
#' See \url{https://openpsychologydata.metajnl.com/articles/10.5334/jopd.35/} for details 
#' and \doi{10.6084/m9.figshare.1577563.v1} for original dataset. 
#' 
#' Additional references at \url{https://bookdown.org/hneth/ds4psy/B.1-datasets-pos.html}. 

"posPsy_p_info"



# (01b) posPsy_AHI_CESD: ------ 

#' Positive Psychology: AHI CESD data 
#'
#' \code{posPsy_AHI_CESD} is a dataset containing answers to the 24 items of the 
#' Authentic Happiness Inventory (AHI) and answers to the 
#' 20 items of the Center for Epidemiological Studies Depression (CES-D) scale 
#' (Radloff, 1977) for multiple (1 to 6) measurement occasions. 
#' 
#' \strong{Codebook} 
#' 
#' \itemize{
#' 
#' \item 1. \strong{id}: Participant ID. 
#' 
#' \item 2. \strong{occasion}: Measurement occasion: 
#'   0: Pretest (i.e., at enrolment),   
#'   1: Posttest (i.e., 7 days after pretest),   
#'   2: 1-week follow-up, (i.e., 14 days after pretest, 7 days after posttest),   
#'   3: 1-month follow-up, (i.e., 38 days after pretest, 31 days after posttest),   
#'   4: 3-month follow-up, (i.e., 98 days after pretest, 91 days after posttest),   
#'   5: 6-month follow-up, (i.e., 189 days after pretest, 182 days after posttest).  
#' 
#' \item 3. \strong{elapsed.days}: Time since enrolment measured in fractional days.
#'  
#' \item 4. \strong{intervention}: Type of intervention: 
#'   3 positive psychology interventions (PPIs), plus 1 control condition: 
#'     1: "Using signature strengths", 
#'     2: "Three good things", 
#'     3: "Gratitude visit", 
#'     4: "Recording early memories" (control condition). 
#' 
#' \item 5.-28. (from \strong{ahi01} to \strong{ahi24}): Responses on 24 AHI items. 
#' 
#' \item 29.-48. (from \strong{cesd01} to \strong{cesd20}): Responses on 20 CES-D items. 
#' 
#' \item 49. \strong{ahiTotal}: Total AHI score. 
#' 
#' \item 50. \strong{cesdTotal}: Total CES-D score.   
#' 
#' }
#' 
#' See codebook and references at \url{https://bookdown.org/hneth/ds4psy/B.1-datasets-pos.html}.
#' 
#' @format A table with 992 cases (rows) and 50 variables (columns).
#'  
#' @family datasets
#' 
#' @seealso 
#' \code{\link{posPsy_long}} for a corrected version of this file (in long format). 
#' 
#' @source 
#' \strong{Articles}
#' 
#' \itemize{
#' 
#' \item Woodworth, R. J., O’Brien-Malone, A., Diamond, M. R., & Schüz, B. (2017). 
#' Web-based positive psychology interventions: A reexamination of effectiveness. 
#' \emph{Journal of Clinical Psychology}, \emph{73}(3), 218--232. 
#' doi: \code{10.1002/jclp.22328} 
#' 
#' \item Woodworth, R. J., O’Brien-Malone, A., Diamond, M. R. and Schüz, B. (2018). 
#' Data from, ‘Web-based positive psychology interventions: A reexamination of effectiveness’. 
#' \emph{Journal of Open Psychology Data}, \emph{6}(1). 
#' doi: \code{10.5334/jopd.35}
#' }
#' 
#' See \url{https://openpsychologydata.metajnl.com/articles/10.5334/jopd.35/} for details 
#' and \doi{10.6084/m9.figshare.1577563.v1} for original dataset. 
#' 
#' Additional references at \url{https://bookdown.org/hneth/ds4psy/B.1-datasets-pos.html}. 

"posPsy_AHI_CESD"



# (01c) posPsy_long: ------ 

#' Positive Psychology: AHI CESD corrected data (in long format) 
#'
#' \code{posPsy_long} is a dataset containing answers to the 24 items of the 
#' Authentic Happiness Inventory (AHI) and answers to the 
#' 20 items of the Center for Epidemiological Studies Depression (CES-D) scale 
#' (see Radloff, 1977) for multiple (1 to 6) measurement occasions.
#' 
#' This dataset is a corrected version of \code{\link{posPsy_AHI_CESD}} 
#' and in long-format. 
#' 
#' @format A table with 990 cases (rows) and 50 variables (columns).
#'  
#' @family datasets
#' 
#' @seealso 
#' \code{\link{posPsy_AHI_CESD}} for source of this file and codebook information;  
#' \code{\link{posPsy_wide}} for a version of this file (in wide format). 
#' 
#' @source 
#' \strong{Articles}
#' 
#' \itemize{
#' 
#' \item Woodworth, R. J., O’Brien-Malone, A., Diamond, M. R., & Schüz, B. (2017). 
#' Web-based positive psychology interventions: A reexamination of effectiveness. 
#' \emph{Journal of Clinical Psychology}, \emph{73}(3), 218--232. 
#' doi: \code{10.1002/jclp.22328} 
#' 
#' \item Woodworth, R. J., O’Brien-Malone, A., Diamond, M. R. and Schüz, B. (2018). 
#' Data from, ‘Web-based positive psychology interventions: A reexamination of effectiveness’. 
#' \emph{Journal of Open Psychology Data}, \emph{6}(1). 
#' doi: \code{10.5334/jopd.35}
#' }
#' 
#' See \url{https://openpsychologydata.metajnl.com/articles/10.5334/jopd.35/} for details 
#' and \doi{10.6084/m9.figshare.1577563.v1} for original dataset. 
#' 
#' Additional references at \url{https://bookdown.org/hneth/ds4psy/B.1-datasets-pos.html}. 

"posPsy_long"



# (01d) posPsy_wide: ------ 

#' Positive Psychology: All corrected data (in wide format) 
#' 
#' \code{posPsy_wide} is a dataset containing answers to the 24 items of the 
#' Authentic Happiness Inventory (AHI) and answers to the 
#' 20 items of the Center for Epidemiological Studies Depression (CES-D) scale 
#' (see Radloff, 1977) for multiple (1 to 6) measurement occasions.
#' 
#' This dataset is based on \code{\link{posPsy_AHI_CESD}} and 
#' \code{\link{posPsy_long}}, but is in wide format. 
#' 
#' @family datasets
#' 
#' @seealso 
#' \code{\link{posPsy_AHI_CESD}} for the source of this file, 
#' \code{\link{posPsy_long}} for a version of this file (in long format). 
#' 
#' @source 
#' \strong{Articles}
#' 
#' \itemize{
#' 
#' \item Woodworth, R. J., O’Brien-Malone, A., Diamond, M. R., & Schüz, B. (2017). 
#' Web-based positive psychology interventions: A reexamination of effectiveness. 
#' \emph{Journal of Clinical Psychology}, \emph{73}(3), 218--232. 
#' doi: \code{10.1002/jclp.22328} 
#' 
#' \item Woodworth, R. J., O’Brien-Malone, A., Diamond, M. R. and Schüz, B. (2018). 
#' Data from, ‘Web-based positive psychology interventions: A reexamination of effectiveness’. 
#' \emph{Journal of Open Psychology Data}, \emph{6}(1). 
#' doi: \code{10.5334/jopd.35}
#' }
#' 
#' See \url{https://openpsychologydata.metajnl.com/articles/10.5334/jopd.35/} for details 
#' and \doi{10.6084/m9.figshare.1577563.v1} for original dataset. 
#' 
#' Additional references at \url{https://bookdown.org/hneth/ds4psy/B.1-datasets-pos.html}. 

"posPsy_wide"




# (02) False Positive Psychology data: ---------- 

# https://bookdown.org/hneth/ds4psy/B.2-datasets-false.html

#' Data: False Positive Psychology
#'
#' \code{falsePosPsy_all} is a dataset containing the data from 2 studies designed to 
#' highlight problematic research practices within psychology. 
#' 
#' Simmons, Nelson and Simonsohn (2011) published a controversial article 
#' with a necessarily false finding. By conducting simulations and 2 simple behavioral experiments, 
#' the authors show that flexibility in data collection, analysis, and reporting 
#' dramatically increases the rate of false-positive findings. 
#' 
#' \describe{
#'   \item{study}{Study ID.}
#'   \item{id}{Participant ID.}
#'   \item{aged}{Days since participant was born (based on their self-reported birthday).}
#'   \item{aged365}{Age in years.}
#'   \item{female}{Is participant a woman? 1: yes, 2: no.}
#'   \item{dad}{Father's age (in years).}
#'   \item{mom}{Mother's age (in years).}
#'   \item{potato}{Did the participant hear the song 'Hot Potato' by The Wiggles? 1: yes, 2: no.}
#'   \item{when64}{Did the participant hear the song 'When I am 64' by The Beatles? 1: yes, 2: no.}      
#'   \item{kalimba}{Did the participant hear the song 'Kalimba' by Mr. Scrub? 1: yes, 2: no.}
#'   \item{cond}{In which condition was the participant? 
#'   control: Subject heard the song 'Kalimba' by Mr. Scrub; 
#'   potato: Subject heard the song 'Hot Potato' by The Wiggles; 
#'   64: Subject heard the song 'When I am 64' by The Beatles.}
#'   \item{root}{Could participant report the square root of 100? 1: yes, 2: no.}      
#'   \item{bird}{Imagine a restaurant you really like offered a 30 percent discount for dining between 4pm and 6pm. 
#'   How likely would you be to take advantage of that offer? 
#'   Scale from 1: very unlikely, 7: very likely.}
#'   \item{political}{In the political spectrum, where would you place yourself? 
#'   Scale: 1: very liberal, 2: liberal, 3: centrist, 4: conservative, 5: very conservative.}
#'   \item{quarterback}{If you had to guess who was chosen the quarterback of the year in Canada last year, 
#'   which of the following four options would you choose? 
#'   1: Dalton Bell, 2: Daryll Clark, 3: Jarious Jackson, 4: Frank Wilczynski.}      
#'   \item{olddays}{How often have you referred to some past part of your life as “the good old days”? 
#'   Scale: 11: never, 12: almost never, 13: sometimes, 14: often, 15: very often.}
#'   \item{feelold}{How old do you feel? 
#'   Scale: 1: very young, 2: young, 3: neither young nor old, 4: old, 5: very old.}
#'   \item{computer}{Computers are complicated machines. 
#'   Scale from 1: strongly disagree, to 5: strongly agree.}      
#'   \item{diner}{Imagine you were going to a diner for dinner tonight, how much do you think you would like the food? 
#'   Scale from 1: dislike extremely, to 9: like extremely.}
#'   }
#' 
#' See \url{https://bookdown.org/hneth/ds4psy/B.2-datasets-false.html} for codebook and more information. 
#'
#'
#' @format A table with 78 cases (rows) and 19 variables (columns):
#' 
#' @family datasets
#' 
#' @source 
#' \strong{Articles}
#' 
#' \itemize{
#' 
#' \item Simmons, J.P., Nelson, L.D., & Simonsohn, U. (2011). 
#' False-positive psychology: Undisclosed flexibility in data collection and analysis 
#' allows presenting anything as significant. 
#' \emph{Psychological Science}, \emph{22}(11), 1359--1366. 
#' doi: \code{10.1177/0956797611417632} 
#' 
#' \item Simmons, J.P., Nelson, L.D., & Simonsohn, U. (2014). 
#' Data from paper "False-Positive Psychology: 
#' Undisclosed Flexibility in Data Collection and Analysis 
#' Allows Presenting Anything as Significant". 
#' \emph{Journal of Open Psychology Data}, \emph{2}(1), e1. 
#' doi: \code{10.5334/jopd.aa} 
#' }
#' 
#' See files at \url{https://openpsychologydata.metajnl.com/articles/10.5334/jopd.aa/} and 
#' the archive at \url{https://zenodo.org/record/7664} for original dataset. 

"falsePosPsy_all"




# (03) Transforming data / dplyr (Chapter 3): outliers ---------- 

# https://bookdown.org/hneth/ds4psy/3-6-transform-ex.html 

#' Outlier data.
#'
#' \code{outliers} is a fictitious dataset containing the id, sex, and height 
#' of 1000 non-existing, but otherwise normal people.  
#' 
#' \strong{Codebook}
#' 
#' \describe{
#'   \item{id}{Participant ID (as character code)}
#'   \item{sex}{Gender (female vs. male)}
#'   \item{height}{Height (in cm)}
#' }
#' 
#' @format A table with 100 cases (rows) and 3 variables (columns). 
#' 
#' @family datasets
#' 
#' @source 
#' See CSV data at \url{http://rpository.com/ds4psy/data/out.csv}. 

"outliers"




# (03.14) pi data: --------  

# https://bookdown.org/hneth/ds4psy/10-3-iter-essentials.html 
# Orig. data source <http://www.geom.uiuc.edu/~huberty/math5337/groupe/digits.html>

# # pi_all <- readLines("./data/pi_100k.txt")                # from local data file
# pi_data <- "http://rpository.com/ds4psy/data/pi_100k.txt"  # URL of online data file
# pi_100k <- readLines(pi_data)                              # read from online source
# 
# # Check:
# dim(pi_100k)  #  NULL !
# 
# # Check number of missing values: 
# sum(is.na(pi_100k))  #  0 missing values
# 
# # Save to /data:
# usethis::use_data(pi_100k, overwrite = TRUE)



#' Data: 100k digits of pi.
#'
#' \code{pi_100k} is a dataset containing the first 100k digits of pi. 
#' 
#' @format A character of \code{nchar(pi_100k) = 100001}. 
#' 
#' @family datasets 
#' 
#' @source 
#' See TXT data at \url{http://rpository.com/ds4psy/data/pi_100k.txt}. 
#' 
#' Original data at \url{http://www.geom.uiuc.edu/~huberty/math5337/groupe/digits.html}. 

"pi_100k"



# (06) Importing data / readr (Chapter 6): ---------- 

# https://bookdown.org/hneth/ds4psy/6-3-import-essentials.html 

# (06a) data_t1.csv: ---- 
# Note: Same as (6a) below. 

# data_t1 <- readr::read_csv("http://rpository.com/ds4psy/data/data_t1.csv")
# 
# # Check: 
# dim(data_t1)  #  20 observations (rows) x 4 variables (columns)
# 
# # Check number of missing values: 
# sum(is.na(data_t1))  #  3 missing values
# 
# # Save to /data:
# usethis::use_data(data_t1, overwrite = TRUE)



#' Data table data_t1.
#'
#' \code{data_t1} is a fictitious dataset to practice importing and joining data 
#' (from a CSV file).  
#' 
#' @format A table with 20 cases (rows) and 4 variables (columns). 
#' 
#' @family datasets
#' 
#' @source 
#' See CSV data at \url{http://rpository.com/ds4psy/data/data_t1.csv}. 

"data_t1"



# (06b) data_t1_de.csv: ---- 

# data_t1_de <- readr::read_csv2("http://rpository.com/ds4psy/data/data_t1_de.csv")
# 
# # Check: 
# dim(data_t1_de)  #  20 observations (rows) x 4 variables (columns)
# 
# # Check number of missing values: 
# sum(is.na(data_t1_de))  #  3 missing values
# 
# # Save to /data:
# usethis::use_data(data_t1_de, overwrite = TRUE)


#' Data import data_t1_de.
#'
#' \code{data_t1_de} is a fictitious dataset to practice importing data  
#' (from a CSV file, de/European style).  
#' 
#' @format A table with 20 cases (rows) and 4 variables (columns). 
#' 
#' @family datasets
#' 
#' @source 
#' See CSV data at \url{http://rpository.com/ds4psy/data/data_t1_de.csv}. 

"data_t1_de"



# (06c) data_t1_tab.csv: ---- 

# data_t1_tab <- read_tsv("http://rpository.com/ds4psy/data/data_t1_tab.csv")
# 
# # Check: 
# dim(data_t1_tab)  #  20 observations (rows) x 4 variables (columns)
# 
# # Check number of missing values: 
# sum(is.na(data_t1_tab))  #  3 missing values
# 
# # Save to /data:
# usethis::use_data(data_t1_tab, overwrite = TRUE)


#' Data import data_t1_tab.
#'
#' \code{data_t1_tab} is a fictitious dataset to practice importing data  
#' (from a TAB file).  
#' 
#' @format A table with 20 cases (rows) and 4 variables (columns). 
#' 
#' @family datasets
#' 
#' @source 
#' See TAB-delimited data at \url{http://rpository.com/ds4psy/data/data_t1_tab.csv}. 

"data_t1_tab"



# (06d) data_1.dat: ---- 

# my_file <- "http://rpository.com/ds4psy/data/data_1.dat"
# 
# data_1 <- readr::read_delim(my_file, delim = ".", 
#                             col_names = c("initials", "age", "tel", "pwd"), 
#                             na = c("-77", "-99"))
# 
# # Check: 
# dim(data_1)  #  100 observations (rows) x 4 variables (columns)
# 
# # Check number of missing values: 
# sum(is.na(data_1))  #  15 missing values
# 
# # Save to /data:
# usethis::use_data(data_1, overwrite = TRUE)



#' Data import data_1.
#'
#' \code{data_1} is a fictitious dataset to practice importing data
#' (from a DELIMITED file).  
#' 
#' @format A table with 100 cases (rows) and 4 variables (columns). 
#' 
#' @family datasets
#' 
#' @source 
#' See DELIMITED data at \url{http://rpository.com/ds4psy/data/data_1.dat}. 

"data_1"


# (06e) data_2.dat: ---- 

# my_file_path <- "http://rpository.com/ds4psy/data/data_2.dat"  # from online source
# 
# # read_fwf: 
# data_2 <- readr::read_fwf(my_file_path, 
#                           fwf_cols(initials = c(1, 2), 
#                                    age = c(4, 5), 
#                                    tel = c(7, 10), 
#                                    pwd = c(12, 17)))
# 
# # Check: 
# dim(data_2)  #  100 observations (rows) x 4 variables (columns)
# 
# # Check number of missing values: 
# sum(is.na(data_2))  #  0 missing values
# 
# # Save to /data:
# usethis::use_data(data_2, overwrite = TRUE)



#' Data import data_2.
#'
#' \code{data_2} is a fictitious dataset to practice importing data  
#' (from a FWF file).  
#' 
#' @format A table with 100 cases (rows) and 4 variables (columns). 
#' 
#' @family datasets
#' 
#' @source 
#' See FWF data at \url{http://rpository.com/ds4psy/data/data_2.dat}. 

"data_2"




# (07) Tidying data / tidyr (Chapter 7): ---------- 

# https://bookdown.org/hneth/ds4psy/7-3-tidy-essentials.html

# (07a) table6.csv: ------ 

# ## Load data (as comma-separated file): 
# table6 <- readr::read_csv("http://rpository.com/ds4psy/data/table6.csv")  # from online source
# 
# # Check: 
# dim(table6)  #  6 observations (rows) x 2 variables (columns)
# 
# # Check number of missing values: 
# sum(is.na(table6))  #  0 missing values
# 
# # Save to /data:
# usethis::use_data(table6, overwrite = TRUE)


#' Data: table6
#'
#' \code{table6} is a fictitious dataset to practice reshaping and tidying data.
#' 
#' This dataset is a further variant of the \code{table1} to \code{table5} datasets 
#' of the \bold{tidyr} package.   
#' 
#' @format A table with 6 cases (rows) and 2 variables (columns). 
#' 
#' @family datasets
#' 
#' @source 
#' See CSV data at \url{http://rpository.com/ds4psy/data/table6.csv}. 

"table6"



# (07b) table7.csv: ------ 

# # Load data (as comma-separated file): 
# table7 <- readr::read_csv("http://rpository.com/ds4psy/data/table7.csv")  # from online source
# 
# # Check: 
# dim(table7)  #  6 observations (rows) x 1 (horrendous) variable (column)
# 
# # Check number of missing values: 
# sum(is.na(table7))  #  0 missing values
# 
# # Save to /data:
# usethis::use_data(table7, overwrite = TRUE)


#' Data: table7
#'
#' \code{table7} is a fictitious dataset to practice reshaping and tidying data.
#' 
#' This dataset is a further variant of the \code{table1} to \code{table5} datasets 
#' of the \bold{tidyr} package.    
#' 
#' @format A table with 6 cases (rows) and 1 (horrendous) variable (column). 
#' 
#' @family datasets
#' 
#' @source 
#' See CSV data at \url{http://rpository.com/ds4psy/data/table7.csv}. 

"table7"



# (07c) table8.csv: ------ 

# # Load data (as comma-separated file): 
# table8 <- readr::read_csv("http://rpository.com/ds4psy/data/table8.csv")  # from online source
# 
# # Check: 
# dim(table8)  #  3 observations (rows) x 5 variables (columns)
# 
# # Check number of missing values: 
# sum(is.na(table8))  #  0 missing values
# 
# # Save to /data:
# usethis::use_data(table8, overwrite = TRUE)


#' Data: table8 
#'
#' \code{table9} is a fictitious dataset to practice reshaping and tidying data.
#' 
#' This dataset is a further variant of the \code{table1} to \code{table5} datasets 
#' of the \bold{tidyr} package.     
#' 
#' @format A table with 3 cases (rows) and 5 variables (columns). 
#' 
#' @family datasets
#' 
#' @source 
#' See CSV data at \url{http://rpository.com/ds4psy/data/table8.csv}. 

"table8"



# (07c2) table9: The contingency table tidyr::table2 as a 3-dimensional array (xtabs) ------ 

# # Data from tidyr::table1 as a contingency table (with a dedicated "count" variable): 
# ct <- tidyr::table2  
# 
# # Create 3-dimensional array (xtabs < table):
# table9 <- stats::xtabs(formula = count ~., data = ct)
# dim(table9)  #  3 2 2
# str(table9)
# sum(table9)  # 2940985206


#' Data table9.
#'
#' \code{table9} is a fictitious dataset to practice reshaping and tidying data.
#' 
#' This dataset is a further variant of the \code{table1} to \code{table5} datasets 
#' of the \bold{tidyr} package.     
#' 
#' @format A 3 x 2 x 2 array (of type "xtabs") with 2940985206 elements (frequency counts). 
#' 
#' @family datasets
#' 
#' @source 
#' Generated by using \code{stats::xtabs(formula = count ~., data = tidyr::table2)}. 

"table9"



# (07d) exp_wide.csv: ------ 

# https://bookdown.org/hneth/ds4psy/7-5-tidy-ex.html

# exp_wide <- readr::read_csv("http://rpository.com/ds4psy/data/exp_wide.csv")  # from online source 
# 
# # Check: 
# dim(exp_wide)  #  10 observations (rows) x 7 variables (columns)
# 
# # Check number of missing values: 
# sum(is.na(exp_wide))  #  0 missing values
# 
# # Save to /data:
# usethis::use_data(exp_wide, overwrite = TRUE)


#' Data exp_wide.
#'
#' \code{exp_wide} is a fictitious dataset to practice tidying data 
#' (here: converting from wide to long format).
#' 
#' @format A table with 10 cases (rows) and 7 variables (columns). 
#' 
#' @family datasets
#' 
#' @source 
#' See CSV data at \url{http://rpository.com/ds4psy/data/exp_wide.csv}. 

"exp_wide"



# (07e) Chapter 7: Exercise 1: 'Four messes and one tidy table': ------ 

# https://bookdown.org/hneth/ds4psy/7-4-tidy-ex.html#tidy:ex01


# (07e1): t_1.csv: ----- 

#' Data: t_1
#'
#' \code{t_1} is a fictitious dataset to practice tidying data.
#' 
#' @format A table with 8 cases (rows) and 9 variables (columns). 
#' 
#' @family datasets
#' 
#' @source 
#' See CSV data at \url{http://rpository.com/ds4psy/data/t_1.csv}. 

"t_1"



# (07e2): t_2.csv: ----- 

#' Data: t_2
#'
#' \code{t_2} is a fictitious dataset to practice tidying data.
#' 
#' @format A table with 8 cases (rows) and 5 variables (columns). 
#' 
#' @family datasets
#' 
#' @source 
#' See CSV data at \url{http://rpository.com/ds4psy/data/t_2.csv}. 

"t_2"



# (07e3): t_3.csv: ----- 

#' Data: t_3
#'
#' \code{t_3} is a fictitious dataset to practice tidying data.
#' 
#' @format A table with 16 cases (rows) and 6 variables (columns). 
#' 
#' @family datasets
#' 
#' @source 
#' See CSV data at \url{http://rpository.com/ds4psy/data/t_3.csv}. 

"t_3"



# (07e4): t_4.csv: ----- 

#' Data: t_4
#'
#' \code{t_4} is a fictitious dataset to practice tidying data.
#' 
#' @format A table with 16 cases (rows) and 8 variables (columns). 
#' 
#' @family datasets
#' 
#' @source 
#' See CSV data at \url{http://rpository.com/ds4psy/data/t_4.csv}. 

"t_4"



# (08) Joining data / dplyr (Chapter 8): ---------- 

# https://bookdown.org/hneth/ds4psy/8-3-join-essentials.html

# (08a) data_t1.csv: ---- 
# Note: Same as (4a) above. 

# data_t1 <- readr::read_csv("http://rpository.com/ds4psy/data/data_t1.csv")
# 
# # Check: 
# dim(data_t1)  #  20 observations (rows) x 4 variables (columns)
# 
# # Check number of missing values: 
# sum(is.na(data_t1))  #  3 missing values
# 
# # Save to /data:
# usethis::use_data(data_t1, overwrite = TRUE)

# See (4a) above.


# (08b) data_t2.csv: ---- 

# data_t2 <- readr::read_csv(file = "http://rpository.com/ds4psy/data/data_t2.csv")
# 
# # Check: 
# dim(data_t2)  #  20 observations (rows) x 4 variables (columns)
# 
# # Check number of missing values: 
# sum(is.na(data_t2))  #  3 missing values
# 
# # Save to /data:
# usethis::use_data(data_t2, overwrite = TRUE)



#' Data table data_t2.
#'
#' \code{data_t2} is a fictitious dataset to practice importing and joining data 
#' (from a CSV file).  
#' 
#' @format A table with 20 cases (rows) and 4 variables (columns). 
#' 
#' @family datasets
#' 
#' @source 
#' See CSV data at \url{http://rpository.com/ds4psy/data/data_t2.csv}. 

"data_t2"



# Exercise 1:

# (08c) t3.csv: ---- 

# t3 <- readr::read_csv(file = "http://rpository.com/ds4psy/data/t3.csv")
# 
# # Check: 
# dim(t3)  #  10 observations (rows) x 4 variables (columns)
# 
# # Check number of missing values: 
# sum(is.na(t3))  #  3 missing values
# 
# # Save to /data:
# usethis::use_data(t3, overwrite = TRUE)


#' Data: t3
#'
#' \code{t3} is a fictitious dataset to practice importing and joining data 
#' (from a CSV file).  
#' 
#' @format A table with 10 cases (rows) and 4 variables (columns). 
#' 
#' @family datasets
#' 
#' @source 
#' See CSV data at \url{http://rpository.com/ds4psy/data/t3.csv}. 

"t3"



# (08d) t4.csv: ---- 

# t4 <- readr::read_csv(file = "http://rpository.com/ds4psy/data/t4.csv")
# 
# # Check: 
# dim(t4)  #  10 observations (rows) x 4 variables (columns)
# 
# # Check number of missing values: 
# sum(is.na(t4))  #  2 missing values
# 
# # Save to /data:
# usethis::use_data(t4, overwrite = TRUE)


#' Data: t4
#'
#' \code{t4} is a fictitious dataset to practice importing and joining data 
#' (from a CSV file).  
#' 
#' @format A table with 10 cases (rows) and 4 variables (columns). 
#' 
#' @family datasets
#' 
#' @source 
#' See CSV data at \url{http://rpository.com/ds4psy/data/t4.csv}. 

"t4"


# Exercise 3: 

# (08e) data_t3.csv: ---- 

# data_t3 <- readr::read_csv(file = "http://rpository.com/ds4psy/data/data_t3.csv")
# 
# # Check: 
# dim(data_t3)  #  20 observations (rows) x 4 variables (columns)
# 
# # Check number of missing values: 
# sum(is.na(data_t3))  #  3 missing values
# 
# # Save to /data:
# usethis::use_data(data_t3, overwrite = TRUE)



#' Data table data_t3.
#'
#' \code{data_t3} is a fictitious dataset to practice importing and joining data 
#' (from a CSV file).  
#' 
#' @format A table with 20 cases (rows) and 4 variables (columns). 
#' 
#' @family datasets
#' 
#' @source 
#' See CSV data at \url{http://rpository.com/ds4psy/data/data_t3.csv}. 

"data_t3"



# (08f) data_t4.csv: ---- 

# data_t4 <- readr::read_csv(file = "http://rpository.com/ds4psy/data/data_t4.csv")
# 
# # Check: 
# dim(data_t4)  #  20 observations (rows) x 4 variables (columns)
# 
# # Check number of missing values: 
# sum(is.na(data_t4))  #  3 missing values
# 
# # Save to /data:
# usethis::use_data(data_t4, overwrite = TRUE)


#' Data table data_t4.
#'
#' \code{data_t4} is a fictitious dataset to practice importing and joining data 
#' (from a CSV file).  
#' 
#' @format A table with 20 cases (rows) and 4 variables (columns). 
#' 
#' @family datasets
#' 
#' @source 
#' See CSV data at \url{http://rpository.com/ds4psy/data/data_t4.csv}. 

"data_t4"




# (09) Text data (Chapter 9): -------- 

# (09a) countries: ---- 

# # Source: <https://www.gapminder.org/data/documentation/gd004/>
# file <- "GM_lifeExpectancy_by_country_v11.csv"
# path <- "./data-raw/raw_data_sources/_gapminder/"
# datapath <- paste0(path, file)
# datapath
# 
# GM_life_expectancy <- readr::read_csv2(file = datapath)
# GM_life_expectancy
# 
# countries <- GM_life_expectancy$country
# countries

#' Data: Names of countries 
#'
#' \code{countries} is a dataset containing the names of 
#' 197 countries (as a vector of text strings). 
#' 
#' @format A vector of type \code{character}  
#' with \code{length(countries) = 197}. 
#' 
#' @family datasets 
#' 
#' @source 
#' Data from \url{https://www.gapminder.org}: 
#' Original data at \url{https://www.gapminder.org/data/documentation/gd004/}.

"countries"



# (09b) fruits: ---- 

# Source: <https://simple.wikipedia.org/wiki/List_of_fruits>
# fruits
# length(fruits)  # 122

#' Data: Names of fruits 
#'
#' \code{fruits} is a dataset containing the names of 
#' 122 fruits (as a vector of text strings). 
#' 
#' Botanically, "fruits" are the seed-bearing structures 
#' of flowering plants (angiosperms) formed from the ovary 
#' after flowering. 
#' 
#' In common usage, "fruits" refer to the fleshy 
#' seed-associated structures of a plant 
#' that taste sweet or sour, 
#' and are edible in their raw state.
#' 
#' @format A vector of type \code{character}  
#' with \code{length(fruits) = 122}. 
#' 
#' @family datasets 
#' 
#' @source 
#' Data based on \url{https://simple.wikipedia.org/wiki/List_of_fruits}.

"fruits"



# (09c) flowery phrases: ---- 

#' Data: Flowery phrases 
#'
#' \code{flowery} contains versions and variations 
#' of Gertrude Stein's popular phrase 
#' "A rose is a rose is a rose".  
#' 
#' The phrase stems from Gertrude Stein's poem "Sacred Emily" 
#' (written in 1913 and published in 1922, in "Geography and Plays").  
#' The verbatim line in the poem actually reads 
#' "Rose is a rose is a rose is a rose". 
#' 
#' See \url{https://en.wikipedia.org/wiki/Rose_is_a_rose_is_a_rose_is_a_rose} 
#' for additional variations and sources. 
#' 
#' @format A vector of type \code{character}  
#' with \code{length(flowery) = 60}. 
#' 
#' @family datasets 
#' 
#' @source 
#' Data based on \url{https://en.wikipedia.org/wiki/Rose_is_a_rose_is_a_rose_is_a_rose}.

"flowery"



# (09e) Bushisms: ---- 

#' Data: Bushisms 
#'
#' \code{Bushisms} contains some phrases uttered by 
#' or attributed to U.S. president George W. Bush 
#' (the 43rd president of the United States of America, 
#' in office from January 2001 to January 2009).
#' 
#' @format A vector of type \code{character}  
#' with \code{length(Bushisms) = 22}. 
#' 
#' @family datasets 
#' 
#' @source 
#' Data based on \url{https://en.wikipedia.org/wiki/Bushism}. 

"Bushisms"


# (09e) Trumpisms: ---- 

#' Data: Trumpisms 
#'
#' \code{Trumpisms} contains characteristic words and phrases used 
#' by U.S. president Donald J. Trump (the 45th and 47th president of the United States of America)  
#' during his first presidency (ranging from January 20, 2017, to January 20, 2021). 
#' 
#' See \url{https://en.wikiquote.org/wiki/Donald_Trump} for a more recent collection of 
#' attributed and disputed quotes.
#' 
#' @format A vector of type \code{character}  
#' with \code{length(Trumpisms) = 168} 
#' (on 2021-01-28).
#' 
#' @family datasets 
#' 
#' @source 
#' Data originally based on a collection of \emph{Donald Trump's 20 most frequently used words} on \url{https://www.yourdictionary.com}  
#' and expanded by interviews, public speeches, and Twitter tweets from \code{https://twitter.com/realDonaldTrump}. 

"Trumpisms"



# (10) Time data (Chapter 10): --------

# (10a) fame: ---- 

# Fame data (DOB and DOD of famous people):
# Chapter 10 (Time data), Exercise 3
# See Exercise 3 at https://bookdown.org/hneth/ds4psy/10-4-time-ex.html#time:ex03 
# See file all_DATASETs.R for raw data (as tables).

#' Data: fame 
#'
#' \code{fame} is a dataset to practice working with dates.
#'  
#' \code{fame} contains the names, areas, dates of birth (DOB), and 
#' --- if applicable --- the dates of death (DOD) of famous people.
#' 
#' @format A table with 67 cases (rows) and 4 variables (columns). 
#' 
#' @family datasets
#' 
#' @source 
#' Student solutions to exercises, dates mostly from \url{https://www.wikipedia.org/}. 

"fame"



# (10b) exp_num_dt data: ---- 

# Experimental numeracy and date-time (dt) data:

# File is a combination from 2 sources:
# A. numeracy data:
# See generating code chunk "data-create-numeracy-data" in ds4psy_book file "55_datasets.Rmd".
# numeracy <- readr::read_csv("../ds4psy/data-raw/numeracy.csv")  # local csv file
# numeracy <- readr::read_csv("http://rpository.com/ds4psy/data/numeracy.csv")  # online
# numeracy  # 1000 x 12

# B. dt data: 
# See generating code chunk "data-create-time-bday-data" in ds4psy_book file "55_datasets.Rmd".
# dt <- readr::read_csv("../ds4psy/data-raw/dt.csv")  # from local file 
# dt <- readr::read_csv("http://rpository.com/ds4psy/data/dt.csv")  # online file
# dt  # 1000 x 9

## Check: 
# dim(exp_num_dt)  # 1000 observations (rows) x 15 variables (columns)
# sum(is.na(exp_num_dt))  # 130 missing values
#
## 250202: Recode the gender variable into true binary variable:
# table(exp_num_dt$gender)
# exp_num_dt$gender[exp_num_dt$gender == "male"] <- "not female"
# table(exp_num_dt$gender)
#
## Store data:
# usethis::use_data(exp_num_dt, overwrite = TRUE)

#' Data from an experiment with numeracy and date-time variables 
#'
#' \code{exp_num_dt} is a fictitious set of data describing 
#' 1000 non-existing, but surprisingly friendly people. 
#' 
#' \strong{Codebook} 
#' The data characterize 1000 individuals (rows) in 15 variables (columns):
#' 
#' \itemize{
#' 
#' \item 1. \strong{name}: Participant initials.
#' 
#' \item 2. \strong{gender}: Self-identified gender (as a binary variable).
#' 
#' \item 3. \strong{bday}: Day (within month) of DOB.
#' 
#' \item 4. \strong{bmonth}: Month (within year) of DOB.
#' 
#' \item 5. \strong{byear}: Year of DOB.
#' 
#' \item 6. \strong{height}: Height (in cm).
#' 
#' \item 7. \strong{blood_type}: Blood type. 
#'  
#' \item 8. \strong{bnt_1} to 11. \strong{bnt_4}: 
#' Correct response to corresponding BNT question? 
#' (1: correct, 0: incorrect).
#' 
#' \item 12. \strong{g_iq} and 13. \strong{s_iq}: 
#' Scores from two IQ tests (general vs. social).
#' 
#' \item 14. \strong{t_1} and 15. \strong{t_2}: 
#' Study start and end time.
#' 
#' } 
#' 
#' \code{exp_num_dt} was generated for practice purposes. 
#' It allows 
#' (1) converting data tables from wider into longer format, 
#' (2) dealing with date- and time-related variables, and 
#' (3) computing, analyzing, and visualizing test scores (e.g., numeracy, IQ). 
#' 
#' The \code{gender} variable was converted into a binary variable 
#' (i.e., using 2 categories "female" and "not female"). 
#' 
#' @format A table with 1000 cases (rows) and 15 variables (columns). 
#' 
#' @family datasets
#' 
#' @source 
#' See CSV data files at 
#' \url{http://rpository.com/ds4psy/data/numeracy.csv} and 
#' \url{http://rpository.com/ds4psy/data/dt.csv}. 

"exp_num_dt"



# (10c) dt_10 data: 10 Danish bdays ---- 

## Sources:
# dt_10   <- readr::read_csv("./data-raw/dt_10.csv") # local file
# dt_10_o <- readr::read_csv("http://rpository.com/ds4psy/data/dt_10.csv")  # online
# all.equal(dt_10, dt_10_o)

## Check: 
# dim(dt_10)  # 10 x 7

#' Data from 10 Danish people 
#'
#' \code{dt_10} contains precise DOB information of 
#' 10 non-existent, but definitely Danish people. 
#' 
#' @format A table with 10 cases (rows) and 7 variables (columns). 
#' 
#' @family datasets
#' 
#' @source 
#' See CSV data file at 
#' \url{http://rpository.com/ds4psy/data/dt_10.csv}. 

"dt_10"



# (11) Function data (Chapter 11): -------- 

# none yet.



# (12) Iteration / loops (Chapter 12): -------- 

# https://bookdown.org/hneth/ds4psy/10-3-iter-essentials.html


# (12a) tb data: ------ 

# tb <- readr::read_csv2("http://rpository.com/ds4psy/data/tb.csv") 
# 
# # Check:
# dim(tb)  #  100 cases x 5 variables
# 
# # Check number of missing values: 
# sum(is.na(tb))  #  0 missing values
# 
# # Save to /data:
# usethis::use_data(tb, overwrite = TRUE)


#' Data table tb.
#'
#' \code{tb} is a fictitious set of data describing 
#' 100 non-existing, but otherwise ordinary people.
#' 
#' \strong{Codebook} 
#' 
#' The table contains 5 columns/variables:
#' 
#' \itemize{
#' 
#' \item 1. \strong{id}: Participant ID.
#' 
#' \item 2. \strong{age}: Age (in years).
#' 
#' \item 3. \strong{height}: Height (in cm).
#' 
#' \item 4. \strong{shoesize}: Shoe size (EU standard).
#' 
#' \item 5. \strong{IQ}: IQ score (according Raven's Regressive Tables).
#' 
#' } 
#' 
#' \code{tb} was originally created to practice loops and iterations 
#' (as a CSV file). 
#' 
#' @format A table with 100 cases (rows) and 5 variables (columns). 
#' 
#' @family datasets
#' 
#' @source 
#' See CSV data file at \url{http://rpository.com/ds4psy/data/tb.csv}. 

"tb"



# (13) i2ds survey data: ------ 

# Data from i2ds online survey
# URL: https://ww3.unipark.de/uc/i2ds_survey/ 
# 2025-11-02


#' Data from the i2ds online survey 
#'
#' \code{i2ds_survey} contains pre-processed data 
#' from the i2ds online survey.
#' 
#' @format 
#' On 2025-11-02, this data contains 60 participants (rows) and 116 variables (columns).
#' 
#' @details 
#' 
#' \strong{Prefix codes}
#' 
#' Many variable names have prefixes that indicate a particular type of variable:
#' 
#' \itemize{
#' 
#' \item \strong{rv}: A random variable
#' 
#' \item \strong{c(#)}: A choice variable (with # alternatives)
#' 
#' \item \strong{t}: A text variable (with any input)
#' 
#' \item \strong{tn}: A text variable (with numeric input)
#' 
#' \item \strong{crs}: A course-related variable
#' 
#' \item \strong{combined}: A composite variable created by averaging either 4 or 5 individual Likert-scale items.
#' Depending on the item set, the resulting score was normalized (i.e., divided by 4 or 5), and stored as a new variable.
#' 
#' }
#' 
#' 
#' \strong{List of variables}
#' 
#' After pre-processing the raw data and re-arranging its variables (columns), 
#' the variable names and their contents in the \code{i2ds_survey} tibble are as follows:
#' 
#' 
#' \enumerate{
#' 
#' \item Key person-related variables:  
#' \code{c4_gender} A categorical (character) variable indicating the participant’s gender identity, 
#' with possible values including "female", "male", "non-binary" or "do not wish to respond". 
#' This variable is used for demographic analysis.
#'
#' \item \code{tn_year} A numeric (double) variable indicating the year of birth (e.g., \code{1999, 2000, 2001}, etc.). 
#'
#' \item \code{tn_month} A numeric (double) variable indicating the participant’s birth month (\code{1–12}). 
#' This variable also supports demographic profiling. 
#' 
#' \item \code{tn_day} A numeric (double) variable indicating the day of birth provided by the participant (\code{1–31}). 
#' Used for demographic purposes and potential exploratory analyses. 
#' DOB-related variables can be used to calculate age and analyze age-related trends. 
#' 
#' \item \code{t_height} A character variable indicating a participant's self-described height, using various formats and units 
#' (e.g., "1.80", "180 cm", "1,80m", or "5'11"). This variable requires pre-processing for analysis.
#' 
#' \item \code{t_pid} An optional character variable capturing a participant ID, pseudonym, or other identifying entry. 
#' This variable allows participants to recognize their own data without disclosing their identity.
#' 
#' 
#' \item Variables indicating informed consent and willingness to share data:   
#' \code{c2_informed_consent} A logical variable indicating whether the participant provided informed consent before starting the study 
#' (\code{TRUE} = consent provided, \code{FALSE} = no consent provided). 
#' This variable is a pre-requisite for ethical compliance (i.e., should be \code{TRUE} for all participants).
#' 
#' \item \code{c2_use_data_2} A logical variable indicating whether a participant still agrees to allow their data to be shared after having finished the survey 
#' (\code{TRUE} = consent provided, \code{FALSE} = no consent provided). 
#' This variable is a pre-requisite for data re-usability in research (and should be \code{TRUE} for all cases included here). 
#' 
#' 
#' \item Variables indicating course membership:  
#' \code{crs_i2ds_1} A logical variable indicating whether a participant is currently enrolled in the course \emph{Introduction to Data Science 1: Basics} (i2ds 1: \code{TRUE} = enrolled). 
#'
#' \item \code{crs_i2ds_2} A logical variable indicating whether a participant is enrolled in the course \emph{Introduction to Data Science 2: Applications} (i2ds 2: \code{TRUE} = enrolled). 
#' 
#' \item \code{crs_ds4psy} A logical variable indicating whether a participant is enrolled in the course \emph{Data Science for Psychology} (ds4psy: \code{TRUE} = enrolled).
#' 
#' \item \code{crs_diff_kn} A logical variable indicating whether a participant is enrolled in a different course at the \emph{University of Konstanz} (\code{TRUE} = yes). 
#' 
#' \item \code{crs_diff_else} A logical variable indicating whether a participant is enrolled in a course \emph{not} at the \emph{University of Konstanz} (\code{TRUE} = yes). 
#' This variable helps identifying external learners.
#' 
#' \item \code{crs_self_study} A logical variable indicating whether a participant is engaging with course materials without formal enrollment (\code{TRUE} = yes). 
#' This variable reflects informal learning engagement.
#' 
#' \item \code{crs_only_study} A logical variable indicating whether a participant is taking the survey only, without engaging with course materials (\code{TRUE} = yes). 
#' This variable identifies participants not studying R or data science.
#'
#' \item \code{t_crs_other} A character variable capturing free-text input describing any other course a participant is taking. 
#' 
#' \item \code{v_crs_other_dept} A character variable indicating the department of the other course(s) mentioned in \code{t_crs_other}. 
#' This variable may facilitate grouping participants by academic discipline. 
#' 
#' 
#' \item Variables indicating (randomized) survey conditions:   
#' \code{rv_anchor_high_low} A randomized (character) variable that indicates whether a person is to keep a relatively large or small number in memory (i.e., assignment to either \code{242} or \code{42}, respectively). This manipulation is used to examine anchoring effects on later responses.
#' 
#' \item \code{rv_scale_randomization} A randomized (character) variable that indicates whether a person was asked to rate their personality (from "serious" to "humorous") on a 4-point or on a 5-point Likert scale. 
#' The variable controls for the influence of scale granularity on ratings. 
#' 
#' \item \code{rv_barnum_pos_neg} A randomized (character) variable that indicates whether the participant is to receive a positive or negative Barnum statement ("positive" vs. "negative"). This is used to measure sensitivity to vague or generic personality feedback.
#' 
#' \item \code{rv_sc_false_dicho_3} A randomized (character) variable indicating which version of the scale is to be shown: a dichotomous comparison between admiration vs. respect, fear vs. love, admiration vs. love and fear, or a single undivided scale (values: "admir_resp"  "fear_love", "admir_love" fear_resp", "single_scale"). Used to examine how scale format affects evaluative judgments.
#' 
#' \item \code{rv_wait_time} A randomized (character) variable that indicates whether the participant waited 10 seconds ("short") or 30 seconds ("long") before continuing. This manipulation aims to examine whether a longer waiting period increases the perceived credibility or value of a following personality feedback, in line with mechanisms underlying the Barnum effect.
#' 
#' \item \code{rv_political_orientation} A randomized (character) variable indicating the order in which the two political orientation scales ("left–right" and "liberal–conservative") were presented. Possible values include "left_right, lib_cons", "left_cons, lib_right", etc. This variable is used to control for potential order effects in political self-placement tasks.
#' 
#' \item \code{rv_thinkingstyle} A randomized (character) variable that indicates the order in which pairs of thinking styles are to be presented ("deliberative vs. intuitive"; "reflective vs. spontaneous";" deliberative vs. spontaneous";"reflective vs. Intuitive"). The order is counterbalanced to reduce presentation bias in self-assessment tasks.
#' 
#' 
#' \item Binary choices on art preference:   
#' \code{c2_img_sel_1} A numeric (double) variable that represents the participant's preferred choice between 2 images in choice Set 1. 
#' The binary variable indicates the participant's image preference: 
#' \itemize{
#' \item \code{1} corresponds to the \emph{cubist} painting \emph{Les Baigneurs (the bathers), by Roger de La Fresnaye, 1912} 
#' \item \code{2} corresponds to the \emph{expressionist} painting \emph{Badende Mädchen (bathing girls), by August Macke, 1913}
#' }
#' 
#' \item \code{c2_img_sel_2} A numeric (double) variable that represents the participant's preferred choice between 2 images in choice Set 2. 
#' The binary variable indicates the participant's image preference: 
#' \itemize{
#' \item \code{1} corresponds to the \emph{cubist} painting \emph{Le Gouter (the taster, aka. tea time), by Jean Metzinger, 1911} 
#' \item \code{2} corresponds to the \emph{expressionist} painting \emph{La petite Jeanne, by Amedeo Modigliani, 1909}
#' }
#' 
#' \item \code{c2_img_sel_3} A numeric (double) variable that represents the participant's preferred choice between 2 images in choice Set 3. 
#' The binary variable indicates the participant's image preference: 
#' \itemize{
#' \item \code{1} corresponds to the \emph{cubist} painting \emph{Edtaonisl Ecclesiastic (the 1st word being an acronym made by alternating the French words for 'star' and 'dance'), by Francis Picabia, 1913} 
#' \item \code{2} corresponds to the \emph{impressionist} painting \emph{Femme avec parasol dans un jardin (woman with parasol in a garden), by Pierre-Auguste Renoir, 1875}
#' }
#' 
#' \item \code{c2_img_sel_4} A numeric (double) variable that represents the participant's preferred choice between 2 images in choice Set 4. 
#' The binary variable indicates the participant's image preference: 
#' \itemize{
#' \item \code{1} corresponds to the \emph{expressionist} painting \emph{Solitude, by Alexej von Jawlensky, 1912} 
#' \item \code{2} corresponds to the \emph{impressionist} painting \emph{Pont dans le Jardin de Monet (bridge in Monet’s garden), by Claude Monet, 1895–96}
#' }
#' 
#' 
#' \item Variables describing habits and preferences:  
#' \code{c7_eating_habits} A categorical (character) variable that indicates which dietary lifestyle an individual assigns to itself
#' (\code{1} = "vegetarian"; \code{2} = "omnivore"; \code{3} = "vegan"; \code{4} = "pescetarian"; \code{5} = "flexitarian"; \code{6} = "carnivore"; \code{7} = "other"). 
#' 
#' \item \code{t_eating_habits_other} A character variable intended to capture free-text input for other dietary descriptions; 
#' usually \code{NA} unless "other" was selected. May appear as logical if no responses were entered.
#' 
#' 
#' \item \code{c7_apple} A numeric (double) variable indicating how much a participant likes apples 
#' on a \code{1-7} ranking scale (\code{1} = highest preference, \code{7} = lowest preference, \code{0} if not ranked).
#' 
#' \item \code{c7_cherry} A numeric (double) variable indicating how much a participant likes cherries 
#' on a \code{1-7} ranking scale (\code{1} = highest preference, \code{7} = lowest preference, \code{0} if not ranked).
#' 
#' \item \code{c7_broccoli} A numeric (double) variable indicating how much a participant likes broccoli 
#' on a \code{1-7} ranking scale (\code{1} = highest preference, \code{7} = lowest preference, \code{0} if not ranked).
#' 
#' \item \code{c7_asparagus} A numeric (double) variable indicating how much a participant likes asparagus 
#' on a \code{1-7} ranking scale (\code{1} = highest preference, \code{7} = lowest preference, \code{0} if not ranked).
#' 
#' \item \code{c7_spinach} A numeric (double) variable indicating how much a participant likes spinach 
#' on a \code{1-7} ranking scale (\code{1} = highest preference, \code{7} = lowest preference, \code{0} if not ranked). 
#'
#' \item \code{c7_mud} A numeric (double) variable indicating how much a participant likes mud 
#' on a \code{1-7} ranking scale (\code{1} = highest preference, \code{7} = lowest preference, \code{0} if not ranked).
#'
#' \item \code{c7_banana} A numeric (double) variable indicating how much a participant likes bananas 
#' on a \code{1-7} ranking scale (\code{1} = highest preference, \code{7} = lowest preference, \code{0} if not ranked). 
#'
#' \strong{Note}: Variables \code{c7_apple} to \code{c7_banana} were derived from a sorting/ranking task in which each participant sorted/ranked food items by preference. 
#' Each item was subsequently coded as a numeric value between \code{1} and \code{7} (\code{0} if not ranked).
#' 
#' 
#' \item Responses to binary choice items:   
#' \code{c2_decsleep_instant} A categorical (character) variable indicating whether a participant prefers to sleep 
#' before making important decisions ("sleep") or to make them instantly ("instant"). 
#'
#' \item \code{c2_shopperson_online} A categorical (character) variable indicating whether a participant prefers shopping in person ("person") or online ("online").
#' 
#' \item \code{c2_town_city} A categorical (character) variable indicating whether a participant prefers living in a town ("town") or in a city ("city"). 
#' 
#' \item \code{c2_club_house} A categorical (character) variable indicating whether a participant prefers to party in a club ("club") or to attend an house party ("house").
#' 
#' \item \code{c2_hotel_camping} A categorical (character) variable capturing a participant's preference for staying in a hotel ("hotel") versus going camping ("camping"). 
#' 
#' \item \code{c2_photo_being} A categorical (character) variable indicating whether a participant prefers photographing ("photo") or being in a moment ("being"). 
#' 
#' \item \code{c2_spring_fall} A categorical (character) variable indicating whether a participant prefers the spring season ("spring") or the fall/autumn season ("fall").
#' 
#' \item \code{c2_beach_mount} A categorical (character) variable reflecting whether a participant prefers the beach ("beach") or the mountains ("mount"). 
#' 
#' \item \code{c2_cats_dogs} A categorical (character) variable indicating preference for cats ("cats") versus dogs ("dogs").
#' 
#' \item \code{c2_indiv_team} A categorical (character) variable indicating whether a participant prefers individual ("indiv") or team sports ("team").
#' 
#' \item \code{c2_movies_books} A categorical (character) variable indicating a participant's preference for movies ("movies") or books ("books").
#' 
#' \item \code{c2_board_video} A categorical (character) variable indicating whether a participant prefers board games ("board") or video games ("video").
#' 
#' \item \code{c2_ios_android} A categorical (character) variable indicating whether a participant prefers iOS ("ios") or Android ("android") as a mobile operating system.
#' 
#' \item \code{c2_text_voice} A categorical (character) variable indicating whether a participant prefers texting ("text") or sending voice messages ("voice").
#' 
#' \item \code{c2_cook_bake} A categorical (character) variable indicating whether a participant prefers cooking ("cook") or baking ("bake").
#' 
#' \item \code{c2_pinapple_no} A categorical (character) variable that records whether a participant likes pineapple on pizza ("yes") or not ("no").
#' 
#' \item \code{c2_ketchup_mayo} A categorical (character) variable indicating whether a participant prefers ketchup ("ketchup") or mayonnaise ("mayo").
#'
#' \item \code{c2_coffee_tea} A categorical (character) variable indicating whether a participant prefers coffee ("coffee") or tea ("tea").
#'
#' \item \code{c2_math_lang} A categorical (character) variable indicating whether a participant prefers mathematics ("math") or language-related subjects ("lang"). 
#'
#' \item \code{c2_odd_even} A categorical (character) variable indicating whether a participant prefers odd numbers ("odd") or even numbers ("even"). 
#'
#'
#' \item \code{c3_diff_bin} A categorical (character) variable indicating how difficult it was for a participant to make their previous preference decisions (items 22--41) . 
#' Response options include "yes", "a little", and "no". 
#' This item captures perceived decisional difficulty and may serve as an indicator of response certainty, thinking style, or task engagement.
#' 
#' 
#' \item Variables on political opinions:  
#' \code{politics_left} A numeric (double) variable representing the participant’s self-placement on a left–right political spectrum. 
#' Values range from \code{1} (left) to \code{6} (right).
#' 
#' \item \code{politics_liberal} A numeric (double) variable representing self-placement on a liberal to conservative scale, ranging from \code{1} (liberal) to \code{6} (conservative). 
#' 
#' 
#' \item Miscellaneous estimates, choices, opinions, and preferences:   
#' \code{tn_estimate_sun} A numeric (double) variable capturing the participant’s estimate of how many times larger the sun’s diameter is compared to that of the earth. 
#' This item serves as a manipulation check for the anchoring effect, based on previously presented numeric anchors (e.g., \code{42} or \code{242}). 
#' 
#' \item \code{t_att_check_1}  A character variable containing the participant’s open-text response to an attention check prompt ("Please type: 'I read the instructions'"). 
#' This attention check allows detecting inattentive or automated responses.
#' 
#' \item \code{c2_fly_invisible} A categorical (character) variable indicating whether the participant would prefer the superpower of flying ("fly") or becoming invisible ("invisible"). 
#' 
#' \item \code{t_fly_invisible_explain} A character variable where participants explain their choice between flying and invisibility. 
#' This free text answer allows for qualitative analysis of a participant's justifications and motivations.
#' 
#' 
#' \item \code{combined_c_ser_hum_self} A numeric (double) variable reflecting a participant’s self-assessment on a "serious vs. humorous" scale. 
#' The score is based on a 4-point or 5-point Likert scale, depending on random assignment. 
#' This variable is used to test how perspective (self vs. others) and scale format (presence vs. absence of a middle option) influences self-ratings.
#' 
#' \item \code{combined_c_ser_hum_others} A combined numeric (double) variable reflecting how humorous or serious participants believe others to perceive them. 
#' This score is derived from either a 4-point or 5-point scale and is used to examine the effect of perspective and scale design on perceived external ratings. 
#'
#' 
#' \item \code{c4_chronotype}  A categorical (character) variable indicating whether the participant identifies as a 
#' morning person ("morning"), evening person ("evening") mid-day person ("mid-day") or a never person ("never"). 
#' 
#' \item \code{tn_sleep} A numeric (double) variable indicating the typical number of hours the participant typically sleeps per night. 
#' 
#' \item \code{tn_bedtime} A character variable representing the participant’s usual bedtime, 
#' to be entered in 24-hour format (e.g., "22:30", "00:00").
#' 
#' \item \code{tn_anchor_recall_1} A numeric (double) variable recording the number (either \code{42} or \code{242}) that the participant was previously asked to memorize and later recall. 
#' It is used to test memory for the anchor manipulation.
#' 
#' 
#' \item \code{combined_admired} A combined numeric (double) variable reflecting how much a participant wants to be admired by others, 
#' rated on a \code{1–6} Likert scale (\code{1} = not at all, \code{6} = very much).
#' 
#' \item \code{combined_feared} A combined numeric (double) variable reflecting how much a participant wants to be feared by others, 
#' rated on a \code{1–6} Likert scale (\code{1} = not at all, \code{6} = very much).
#' 
#' \item \code{combined_loved} A combined numeric (double) variable reflecting how much a participant wants to be loved by others, 
#' rated on a \code{1–6} Likert scale (\code{1} = not at all, \code{6} = very much).
#'
#' \item \code{combined_respected}  A combined numeric (double) variable reflecting how much a participant wants to be respected by others, 
#' rated on a \code{1–6} Likert scale (\code{1} = not at all, \code{6} = very much).
#'
#'
#' \item \code{c7_pess_opti} A numeric (double) variable capturing a participant’s self-rated tendency toward pessimism versus optimism, 
#' on a 7-point scale (\code{1} = very pessimistic, \code{7} = very optimistic). 
#'
#' \item \code{c7_story_list} A numeric (double) variable indicating how much a participant enjoys listening to or reading stories, 
#' rated from \code{1} (not at all) to \code{7} (very much). 
#'
#' \item \code{c7_stab_adv} A numeric (double) variable indicating a participant’s self-assessed position on a stability versus adventurousness spectrum, 
#' rated on a scale from \code{1} (very stable) to \code{7} (very adventurous). 
#' This variable may indicate personality traits related to risk-taking.
#' 
#' 
#' \item \code{think_reflect} A numeric (double) variable representing a participant’s placement on a bipolar scale ranging from \code{1} ("reflective") to \code{6} (either "spontaneous" or " intuitive"). 
#' The specific version of the 2nd scale anchor is randomly assigned. 
#' 
#' \item \code{think_delib} A numeric (double) variable representing a participant’s placement on a bipolar scale ranging from \code{1} ("deliberative") to \code{6} (either "intuitive" or " spontaneous". 
#' The specific version of the 2nd scale anchor is randomly assigned. 
#' 
#' 
#' \item \code{c4_intro_extrovert} A categorical (character) variable indicating a participant's self-rated social orientation: 
#' "introverted", "extroverted", or mixed variants such as "extro-intro" or "intro-extro". 
#' 
#' \item \code{tn_favorit_number} A numeric (double) variable capturing a participant’s favorite number, 
#' in free answer format. 
#' 
#' 
#' \item \code{c3_cutlery} A categorical (character) variable indicating which piece of cutlery a participant most identifies with. 
#' The 3 possible values include "knife", "fork", and "spoon". 
#' 
#' \item \code{c3_rock_paper_scissors} A categorical (character) variable capturing a participant's selection in a rock–paper–scissors scenario. 
#' The 3 possible values are "rock", "paper", or "scissors".
#' 
#' 
#' \item \code{c5_att_check_2} A numeric (double) variable used as an attention check. 
#' Participants were asked to select the number that most resembles the shape of a circle. 
#' The correct response is \code{0}, which corresponds to scale option 5. 
#' Responses deviating from this may indicate inattentiveness.
#' 
#' 
#' \item \code{c6_barnum_accuracy} A numeric (double) variable indicating how accurately a participant rated a generic personality description (i.e., a Barnum statement), 
#' on a scale from \code{1} (poor) to \code{6} (perfect). 
#' This variable is used to assess susceptibility to the so-called \emph{Barnum effect} (i.e., the tendency to perceive vague and general statements as highly accurate). 
#' 
#' \item \code{t_anchor_recall_2} A numeric (double) variable recording whether a participant correctly remembered a previously presented number (either \code{42} or \code{242}). 
#' This assesses memory and anchoring manipulation success (for a 2nd time).
#' 
#' 
#' 
#' \item Other person-related variables:   
#' \code{c9_occupation} A categorical (character) variable indicating a participant’s current occupational status 
#' (e.g., "student", "employed", "other"). This variable may be used for demographic segmentation. 
#' 
#' \item \code{t_occupation_other} A logical variable for free-text input if a participant selected "other" for occupation. 
#' This variable captures detailed occupational descriptions not covered by the pre-defined options.
#' 
#' \item \code{c7_education} A categorical (character) variable indicating a participant’s highest completed education level 
#' (e.g., "high school", "bachelor", "master"). This variable may be used for demographic segmentation.
#' 
#' \item \code{t_education_other} A logical variable to allow participants to enter their education level in free text (if "other" was selected). 
#' This variable enables open-format responses for less common education paths.
#' 
#' \item \code{c3_current_degree} A categorical (character) variable indicating the type of academic degree a participant is currently pursuing (e.g, "bachelor", "master"). 
#' This variable provides educational context for other academic measures.
#' 
#' \item \code{tn_semester} A numeric (double) variable indicating the current semester of study reported by a participant (e.g., 1, 6, 10). 
#' This variable helps contextualize course experience and academic progress.
#'
#' \item \code{c14_studyfield} A categorical (character) variable indicating the participant’s field of study (e.g., "psychology", "data science"). 
#' This variable is used to examine field-specific attitudes and skills.
#'
#' \item \code{t_studyfield_other} A character variable capturing free-text responses if the participant selected "other" as their study field. 
#' This variable allows classification of less common disciplines. 
#' 
#' 
#' \item Preferences for course contents:  
#' \code{c5_pref_stats} A numeric (double) variable indicating a participant’s interest in preparing data for statistical analysis, 
#' rated on a scale from \code{1} (no interest) to \code{5} (absolutely essential).
#'
#' \item \code{c5_pref_visualize} A numeric (double) variable indicating a participant's interest in data visualization in R, 
#' rated on a scale from \code{1} (no interest) to \code{5} (absolutely essential).
#'
#' \item \code{c5_pref_sims} A numeric (double) variable indicating a participant’s interest in using R for simulations and modeling, 
#' rated on a scale from \code{1} (no interest) to \code{5} (absolutely essential).
#'
#' \item \code{c5_pref_shiny} A numeric (double) variable capturing how essential a participant considers learning to build interactive web applications using R Shiny. 
#' Responses range from \code{1} (no interest) to \code{5} (absolutely essential).
#'
#' \item \code{c5_pref_scrape} A numeric (double) variable capturing how essential a participant considers learning web scraping with R. 
#' Responses range from \code{1} (no interest) to \code{5} (absolutely essential).
#' 
#' \item \code{c5_pref_arts} A numeric (double) variable capturing how essential a participant considers exploring artistic or creative aspects of data science (e.g., generative art in R). 
#' Responses range from \code{1} (no interest) to \code{5} (absolutely essential).
#' 
#' 
#' \item Course-related expectations and worries:   
#' \code{t_crs_expect_i2ds_1} A character variable containing free-text input describing a participant’s expectations and hopes for the course \emph{Introduction to Data Science 1: Basics} (i2ds 1). 
#'
#' \item \code{t_crs_worry_i2ds_1} A character variable capturing free-text responses describing a participant’s worries and reservations related to the course \emph{Introduction to Data Science 1: Basics} (i2ds 1). 
#' 
#' \item \code{t_crs_expect_i2ds_2} A character variable containing free-text input describing a participant’s expectations and hopes for the course \emph{Introduction to Data Science 2: Applications} (i2ds 2). 
#' 
#' \item \code{t_crs_worry_i2ds_2} A character variable capturing free-text input describing a participant’s worries and reservations concerns related to the course \emph{Introduction to Data Science 2: Applications} (i2ds 2). 
#' 
#' \item \code{t_crs_expect_ds4psy} A logical variable containing free-text input describing a participant’s expectations and hopes for the course \emph{Data Science for Psychology} (ds4psy). 
#' 
#' \item \code{t_crs_worry_ds4psy} A logical variable describing a participant’s worries and reservations regarding the course \emph{Data Science for Psychology} (ds4psy), in free text format. 
#' 
#' 
#' \item Variables on prior experience:   
#' \code{c6_exp_math} A numeric (double) variable indicating a participant’s self-assessed experience with mathematics, 
#' rated on a scale from \code{1} (no experience) to \code{6} (extremely experienced). 
#' 
#' \item \code{c6_exp_statistics} A numeric (double) variable measuring a participant’s self-assessed experience with statistics, 
#' rated on a scale from \code{1} (no experience) to \code{6} (extremely experienced). 
#' 
#' \item \code{c6_exp_program} A numeric (double) variable indicating a participant’s experience with programming (any programming language), 
#' rated on a scale from \code{1} (no experience) to \code{6} (extremely experienced).
#' 
#' \item \code{c6_exp_r} A numeric (double) variable indicating a participant’s experience with R programming, 
#' rated on a scale from \code{1} (no experience) to \code{6} (extremely experienced).
#' 
#' \item \code{c6_exp_datavisual} A numeric (double) variable capturing a participant’s prior experience with data visualization, 
#' rated on a scale from \code{1} (no experience) to \code{6} (extremely experienced).
#' 
#' 
#' \item Survey feedback:   
#' \code{t_feedback} An optional character variable containing general feedback provided by the participant regarding the survey or course. 
#' This is an open-ended text field for final comments, impressions, or suggestions.
#' 
#' 
#' \item Session info:  
#' \code{referer} URL of referring page. 
#' 
#' \item \code{datetime} Date and time of initial survey access.
#' 
#' \item \code{duration} Session duration (in seconds). 
#' 
#' \item \code{date_of_last_access} Date and time of final survey access.
#' 
#' }
#' 
#' See the \strong{codebook} and \strong{print version} for additional coding details.
#' 
#' \strong{Missing values} are represented as \code{NA} values in the data. 
#' These can be due to a participant not providing a response to an item or to an item not being applicable to this participant. 
#' 
#' @family datasets
#' 
#' @source 
#' See online survey at \url{https://ww3.unipark.de/uc/i2ds_survey/}. 


"i2ds_survey"


# +++ here now +++


## Check data: ------ 

## Check for "marked UTF-8 strings":

# tools:::.check_package_datasets(".")



## ToDo: ----------

# - Add date/time data (Chapter 10: Time, e.g., DOB, time of test, task start/end, etc.)
# - Combine 2 datasets (currently online):
#   a. numeracy.csv (1000 x 12, see book chapter 55_datasets.Rmd), 
#   b. dt.csv (1000 x 9): date and time variables (see book chapter 10_times.Rmd)
# - Consider combining with dataset `outliers` (1000 x 3), BUT: different genders and height values and regularities
# - Collect ds4psy survey data
# - Find some book/text to analyze (Chapter 9: Text data).
# - Add text data (Chapter 9: Text; e.g., dinos, fruit, veggies, attention check response on "i read instructions", some eBook for sentinent analysis, ...) 
# - Add more info to codebooks (see data_190807.R in archive)

## eof. ----------------------