R/esAssign.R

#' esAssign
#
#' @description esAssign assigns ESM questionnaires to the (selected) persons who generated them.
#
#' @param esDf a data.frame. A single ESM dataset. It must contain the 2 columns that hold the date-time object for when an ESM questionnaire was started and finished, respectively.
#
#' @param refDf a data.frame. The reference dataset.
#
#' @param singlePerson a character string. If you want to select a single person (must be contained in the reference dataset) enter its identification code (ID); else all participants in the reference dataset are selected (default).
#
#' @param prompted logical. If there was no prompt per ESM day at all, enter FALSE; else ignore this argument, meaning that per default at least one prompt per ESM day is assumed.
#
#' @param promptTimeframe an integer value. The default value is 30, i.e. within "genDateTime" \strong{minutes} around each prompt a participant is expected to have answered the ESM questionnaire. This argument must be set to an integer value larger than 0, even if there was no prompt at all. If NA, NULL, 0 is passed to this argument, the function returns an error. If less than "genDateTime" minutes is passed to this argument, the function returns a warning message.
#
#' @param dstDates a vector of character strings. If a check shall be made concerning the daylight saving time (last weekend in March and October, respectively), enter the respective date(s) in the form yyyy-mm-dd as vector, e.g. c("2007-10-28", "2008-03-30).
#
#' @param midnightPrompt logical. Default FALSE, i.e. no participant was able to start a questionnaire around the midnight hour. However, if this was possible, set this argument to TRUE, in which case it takes the function \code{esAssign} a little longer to do its job, compared to midnightPrompt = FALSE.
#
#' @param RELEVANTVN_ES a list. This list is generated by function \code{\link{setES}} and it is extended once either by function \code{\link{genDateTime}} or by function \code{\link{splitDateTime}}.
#
#' @param RELEVANTINFO_ES a list. This list is generated by function \code{\link{setES}}.
#
#' @param RELEVANTVN_REF a list. This list is generated by function \code{\link{setREF}} and it is extended once either by function \code{\link{genDateTime}} or by function \code{\link{splitDateTime}}.
#
#' @details Data can only be assigned to those individuals who are contained in the reference dataset.\cr \code{esAssign} is of utmost importance in preparing the ESM dataset because the results of any statistical analysis depends on the correct assignment of data to the persons who generated it.
#' New columns in output list of function \code{esAssign} are
#' \enumerate{
#' \item ID. Unique identification code of each participant.
#' \item CV_ES. CV is short for count variable. It counts all the questionnaires that have been filled out by the participant during the ESM period. In incrementing order it starts at 1 and skips a number, whenever a questionnaire is missing.
#' \item CV_ESDAY. This variable counts the single ESM days. In incrementing order it starts at 1. It only skips a number when all questionnaires of that day are missing.
#' \item CV_ESWEEKDAY. This variable counts the weekday, with Monday represented by the value 1, ..., Sunday = 7.
#' \item PROMPT. Correspondance of the actual start time of the questionnaire to its prompt (in our exemplary dataset this ranges between 1 and 4).
#' \item PROMPTEND. Correspondance of the actual end time of the questionnaire to its prompt (in our exemplary dataset this ranges between 1 and 4).
#' \item LAG_MINS. Time difference in minutes (rounded) between the scheduled time of the prompt and the actual start time of a questionnaire.
#' \item ES_MULT. Dichotomous variable. The value 1 represents a questionnaire that has been filled out repeatedly at one specific prompt.
#' \item ES_MULT2. Alternative representation of ES_MULT. The very first questionnaire at a prompt is represented by the value 1, the second questionnaire (i.e. the first repeatedly filled out q.) is represented by the value 2, etc.
#' \item ST. Assigns the prompt/scheduled time (ST) to the actual start time of a questionnaire, by choosing the miminal time difference between all possible prompts (per participant) and the actual start time of the single ESM questionnaire.
#' \item STDATE. Variable is returned only if argument midnightPrompt is set to TRUE. Possible values and meaning: -1 = scheduled start date is prior to actual start date; 0 = scheduled start date and actual start date are equal to one another; 1 = scheduled date is subsequent to actual start date.
#' \item TFRAME. Dichotomous variable. The value 1 represents a questionnaire that is within the time frame, as specified by the user.
#' \item DST. Dichotomous variable. The value 1 represents a questionnaire's date to be equal or later than the daylight saving date, as specified by the user.
#' \item QWST. Dichotomous variable. The value 1 represents a questionnaire to be fully within the scheduled time, i.e. the time differences of both the actual start time and the actual end time are minimal relative to the same scheduled time.
#' }
#' Additionally, if the reference dataset contains any duplicates in the column representing the participant IDs, then \code{esAssign} stops and an error message is printed in the R console.
#
#' @return The user receives a list containing 4 datasets:
#' \enumerate{
#' \item ES, i.e. ESM data assigned to selected participants with new columns added, see \strong{Details}.
#' \item ESopt, i.e. the optimal ESM data sequence for all selected participants.
#' \item ESout, i.e. ESM data that couldn't be assigned.
#' \item ESrate, i.e. the average completion rates per participant, both per prompt and overall.
#' }
#' The effective ESM completion rates per selected individual and per prompt are also printed to the console. However, these are not the final completion rates, since some of the current questionnaires later might either be removed (see function \code{\link{intolerable}}) or be shifted to a neighboring prompt index (see functions \code{\link{suggestShift}} and \code{\link{makeShift}}).
#
#' @importFrom lubridate interval ymd_hms as.duration ymd minutes
#' @importFrom stats setNames var
#
#' @examples
#' # o=o=o=o=o=o=o=o=o=o=o=o=o=o=o=o=o=o=o=o=o=o=o=o=o=o=o=o=o=o=o=o=o=o
#' # Prerequisites in order to execute esAssign. Start -----------------
#' # esMerged1 is the raw ESM dataset, also delivered with the package
#' # referenceDfNew is the modified reference dataset, delivered with
#' # the package.
#' # RELEVANTINFO_ES is delivered with the package
#' # Use example list delivered with the package
#' RELEVANTVN_ES <- RELEVANTVN_ESext
#' # Use example list delivered with the package
#' RELEVANTVN_REF <- RELEVANTVN_REFext
#' # Prerequisites in order to execute esAssign. End -------------------
#' # -------------------------------------------------------
#' # Run function 17 of 29; see esmprep functions' hierarchy.
#' # -------------------------------------------------------
#' # Assign questionnaires contained in the raw ESM dataset to all participants listed
#' # in the reference dataset. esMerged1 is the result of function 'convertChars',
#' # referenceDfNew is the result of function 'genDateTime' or of function
#' # 'splitDateTime'.
#' # Run only the first 2 out of 8 participants (saves time). The warning message the
#' # user receives in this case (2 out of 8 participants) is correct.
#' esAssigned <- esAssign(esDf = esMerged1, refDf = referenceDfNew[1:2,], RELEVANTINFO_ES,
#' RELEVANTVN_ES, RELEVANTVN_REF)
#' # # Assign questionnaires contained in the raw ESM dataset to participant P001 listed
#' # # in the reference dataset.
#' # esAssigned <- esAssign(esDf = esMerged1, refDf = referenceDfNew, RELEVANTINFO_ES,
#' # RELEVANTVN_ES, RELEVANTVN_REF, singlePerson="P001")
#' # More options can be passed to 'esAssign', see parameter description. Note that when
#' # setting the argument midnightPrompt to TRUE, esAssign takes a bit longer to do its job.
#' # Output: List with 4 data.frames.
#' names(esAssigned)
#' # o=o=o=o=o=o=o=o=o=o=o=o=o=o=o=o=o=o=o=o=o=o=o=o=o=o=o=o=o=o=o=o=o=o
#
#' @seealso Exemplary code (fully executable) in the documentation of \code{\link{esmprep}} (function 17 of 29).
#
#' @export
#
esAssign <- function(esDf, refDf, RELEVANTINFO_ES = NULL, RELEVANTVN_ES = NULL, RELEVANTVN_REF = NULL, singlePerson = NULL, prompted = NULL, promptTimeframe = 30, midnightPrompt = FALSE, dstDates = NULL) {

    # Possible errors when passing arguments to the function -----------------------------
    if(!is.data.frame(esDf) | !is.data.frame(refDf)) {
        stop("Arguments 'esDf' and 'refDf' both must be of type data.frame.")
    }
	
	
	# Error handling function for all set-up lists generated by setES and setREF.
    # Both lists RELEVANTVN_ES and RELEVANTVN_REF get extended either by function
    # genDateTime or by function splitDateTime!
    SETUPLISTCheck(RELEVANTINFO_ES=RELEVANTINFO_ES,
    			   RELEVANTVN_ES=RELEVANTVN_ES,
    			   RELEVANTVN_REF=RELEVANTVN_REF)
	
	refDfPlausible <- try(refPlausible(refDf, RELEVANTVN_REF=RELEVANTVN_REF))
	if(inherits(refDfPlausible, "try-error")) {
		# No need to do anything. Let the function throw the error which
		# automatically stops the function 'esAssign' from continuing.
	} else {
		# Print the range of all ESM periods (in days) in the R console.
		cat("- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -\n")
		message("Range of ESM periods (in days) across all participants in the current reference dataset.")
		print(summary(refDfPlausible[,"ESM_PERIODDAYS"]))
		cat("- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -\n\n\n")
	}
	
    # Set optional parameters to their default value
    # ----------------------------------------------

    # If argument 'singlePerson' is not used set 'assignAll' to TRUE
    # and 'singlePerson' to NA.
    if(is.null(singlePerson)) {

        assignAll <- TRUE

        # Generate indices for all persons that make up 'refDf'
        selectedPerson_s <- 1:nrow(refDf)

        # Check whether all prompteduled start dates and end dates in 'refDf'
        # are contained in the raw ES dataset. If they aren't the raw ES
        # might not have been updated.
        compDates <- compareDates(esDf = esDf, refDf = refDf, assignAll = assignAll,
                         singlePerson = singlePerson, RELEVANTVN_ES = RELEVANTVN_ES, RELEVANTVN_REF = RELEVANTVN_REF)
        if(compDates == "DatesEqual") {
        		# All is well, maximum dates are equal to one another.
        } else if(compDates == "refDateGreater") {
            warning("Maximum start date in the reference dataset exceeds the maximum date in the current raw ESM dataset. Is the current ESM dataset most up to date?")
        } else if(compDates == "refDateLess") {
        		warning("Maximum start date in the ESM dataset exceeds the maximum date in the current reference dataset. Is the current reference dataset most up to date?")
        }

        # If (1) a single person ID is entered, (2) NA must not be entered,
        # (3) the entered ID must be of type character, and (4) the entered ID
        # must be part of the dataset 'refDf'.
    } else if(!is.null(singlePerson) & is.character(singlePerson) & all(refDf[,RELEVANTVN_REF[["REF_ID"]]] == singlePerson)) {
        stop("Entered person ID is not found in person dataset (argument 'refDf'). Please check.")

        # Else if conditions (1)-(4) are met, there is still some checking necessary.
    } else if(!is.null(singlePerson) &
              is.character(singlePerson) &
              any(refDf[,RELEVANTVN_REF[["REF_ID"]]] == singlePerson)) {
		
		assignAll <- FALSE
		
        # Check 1: The person ID must occur only once in 'refDf'!
        isSingle <- length(which(refDf[,RELEVANTVN_REF[["REF_ID"]]] == singlePerson)) == 1
        if(!isSingle) {
            multLines <- which(refDf[,RELEVANTVN_REF[["REF_ID"]]] == singlePerson)
            stop(paste0("Person ID was registered ", length(multLines), " times."))
            cat("See lines [", multLines, "] of person dataset.\n")

            # Check 2: Is the person's prompteduled start date and end date
            # contained in the raw ES dataset?  If they aren't the raw ES
            # might not have been updated.
        } else {

            # Check 1: Are the person's prompteduled dates contained in the
            # raw ES dataset?
            selectedPerson_s <- which(refDf[,RELEVANTVN_REF[["REF_ID"]]] == singlePerson)
            
            compDates <- compareDates(esDf = esDf, refDf = refDf, assignAll = assignAll,
	                         singlePerson = singlePerson, RELEVANTVN_ES = RELEVANTVN_ES, RELEVANTVN_REF = RELEVANTVN_REF)
	        if(compDates == "DatesEqual") {
	        		# All is well, maximum dates are equal to one another.
	        } else if(compDates == "refDateGreater") {
	            warning("Maximum start date in the reference dataset exceeds the maximum date in the current raw ESM dataset. Is the current ESM dataset most up to date?")
	        } else if(compDates == "refDateLess") {
	        		warning("Maximum start date in the ESM dataset exceeds the maximum date in the current reference dataset. Is the current reference dataset most up to date?")

            }
        }

    }
    
    # If argument 'prompted' is not passed set it to TRUE (same as asking whether it was not passed, i.e. asking whether it is null.)
    if(is.null(prompted)) {prompted <- is.null(prompted)}
	
	if(is.null(promptTimeframe) | is.na(promptTimeframe) | !(is.integer(promptTimeframe) | is.numeric(promptTimeframe))) {
		stop("The argument promptTimeframe must be present. It must be numeric, without decimals. It denotes the time in minutes, within which an ESM questionnaire must have been started in order to be included in the analyses. Per default it is set to 30 minutes.")
	}
	
	if(promptTimeframe %% 1 != 0) {
		promptTimeframe <- as.integer(promptTimeframe)
        warning(paste("The argument promptTimeframe was not of type integer. It has been truncated to", promptTimeframe))
    }
    
    if(promptTimeframe == 0) {
    	stop("The argument promptTimeframe must be present. It must be numeric, without decimals. It denotes the time in minutes, within which an ESM questionnaire must have been started in order to be included in the analyses. It must not be 0 minutes. Per default it is set to 30 minutes.")
    }
    
    if(promptTimeframe < 30) {
        warning("The argument promptTimeframe has been set to less than 30 minutes. This might lead to the unintended exclusion of the last ESM questionnaire of a participant.")
    }
	
	if(is.null(dstDates)) {
		# Don't do anything.
	} else if(any(!is.vector(dstDates), !is.character(dstDates))) {
        stop("Argument 'dstDates' (Daylight saving date(s)) must either be NULL or it must consist of at least one date of the form ymd, i.e. 'yyyy-mm-dd' (e.g. '2007-10-28').")
    } else {
        # # dstDates <- c("20.1503.04", "2015.10.20")		# 1st value causes an error
        ds_ymd <- tryCatch({lubridate::ymd(dstDates)}, warning = function(w) {"warning_DSD"})
        dsDate <- tryCatch({as.Date(ds_ymd)}, error = function(e) {"error_DSD"})
        if(any(any(as.character(ds_ymd)=="warning_DSD")|
               any(as.character(dsDate)=="error_DSD"))) {
            stop("Argument 'dstDates' (Daylight saving date(s)) must either be NULL or it must consist of at least one date of the form ymd, i.e. 'yyyy-mm-dd' (e.g. '2007-10-28').")
        }
    }
    # -------------------------------------

    # Start assignment process
    # ------------------------

    # Order rows
    # ----------
    
    registerInfo <- as.character(unlist(RELEVANTVN_ES))
    esDfOrd <- orderByTimeAndPhone(esDf, RELEVANTVN_ES = RELEVANTVN_ES)

    esDfOrdReg <- esDfOrd[,registerInfo]

    # for-loop with variables to append to one another
    # --------
    ID <- Lines <- c()
    CV_ES <- CV_ESDAY <- CV_ESWEEKDAY <- c()
    ES_MULT <- PROMPT <- PROMPTEND <- ST <- STDATE <- LAG_MINS <- TFRAME <- DST <- QWST <- c()

    esOptDf_colNames <- c(RELEVANTVN_REF[["REF_ID"]], "CV_ES", RELEVANTVN_REF[["REF_START_DATE"]], RELEVANTVN_REF[["REF_START_TIME"]], "PROMPT")
    esOptDf <- setNames(data.frame(matrix(ncol=5, nrow=0)), esOptDf_colNames)
	
	avrgCompletionRate <- c()
	
	# =/*=/=*=/=*=/*=/=/*=/=*=/=*=/*=/=/*=/=*=/=*=/*=/=/*=/=*=/=*=/*=/

    for(i in selectedPerson_s ) {

        #
        person_i <- as.character(refDf[,RELEVANTVN_REF[["REF_ID"]]] [i])
        cat(paste(person_i, "\nNo.", i, "of", length(selectedPerson_s)), "\n")

        # 1.
        span_s <- lubridate::interval(lubridate::ymd_hms(paste(refDf[i,RELEVANTVN_REF[["REF_START_DATE"]]], refDf[i,RELEVANTVN_REF[["REF_START_TIME"]]])),
                                      lubridate::ymd_hms(esDfOrdReg[,RELEVANTVN_ES[["ES_START_DATETIME"]]]))
        dur_s <- as.numeric(lubridate::as.duration(span_s))

        # 2.
        span_e <- lubridate::interval(lubridate::ymd_hms(esDfOrdReg[,RELEVANTVN_ES[["ES_START_DATETIME"]]]),
                                      lubridate::ymd_hms(paste(refDf[i,RELEVANTVN_REF[["REF_END_DATE"]]], refDf[i,RELEVANTVN_REF[["REF_END_TIME"]]])) + lubridate::minutes(x=promptTimeframe))
        dur_e <- as.numeric(lubridate::as.duration(span_e))

        # ------------------------------------------------------------
        # Valid if per person there is exactly one IMEI number.
        # 3.
        lines <- which(dur_s >= 0 & dur_e >= 0 & esDfOrdReg[,RELEVANTVN_ES[["ES_IMEI"]]] == refDf[i,RELEVANTVN_REF[["REF_IMEI"]]])
        # ------------------------------------------------------------

        LinesValid <- lines
        if(length(LinesValid) == 0) {
            warning(paste("No valid lines in raw ESM dataset found for >>", refDf[i,RELEVANTVN_REF[["REF_ID"]]], "<<. Possible reasons: Information in reference dataset is wrong (especially IMEI and/or start date/end date) or raw ESM dataset has not been updated yet."))
            next
        }
		
		# =/*=/=*=/=*=/*=/=/*=/=*=/=*=/*=/=/*=/=*=/=*=/*=/=/*=/=*=/=*=/*=/
		# =/*=/=*=/=*=/*=/=/*=/=*=/=*=/*=/=/*=/=*=/=*=/*=/=/*=/=*=/=*=/*=/
		
        # If there is at least one prompteduled time the following computations are
        # possible and make sense.
        if(prompted == TRUE) {
            # ---------------------------------------------------------------------------
            # ADAPT to number of daily ES questionnaires
            # -------------------------------------------

            # Generate st_vec and stVec_idx
            # -----------------------------
            df_timeTemp <- data.frame(row.names = 1:length(LinesValid))

            df_timeTemp[,RELEVANTVN_ES[["ES_SVY_NAME"]]] <-
                esDfOrdReg[LinesValid,RELEVANTVN_ES[["ES_SVY_NAME"]]]

            df_timeTemp[,RELEVANTVN_ES[["ES_START_DATETIME"]]] <-
                esDfOrdReg[LinesValid,RELEVANTVN_ES[["ES_START_DATETIME"]]]

            # Adapt automatically to the number of prompteduled questionnaires per ES day.
            # stTimes contains the column names of the temporally built data frame that
            # will be used for further computations.
            stTimes <- paste0("stTime", 1:RELEVANTINFO_ES[["MAXPROMPT"]])
            # REF_ST: Columns of the variables in the dataset 'refDf' that denote the
            # exact prompteduled ES times (prompts) for person i.
            
            # Specified sequence of ESM prompts (times of day)
            esmUnitSecs <- as.numeric(lubridate::hms(refDf [i,RELEVANTVN_REF[["REF_ST"]]]))
			# All times of the day must be larger than the first time of the day.
			esmUnitCheck <- c(TRUE, (esmUnitSecs[1] < esmUnitSecs)[2:length(esmUnitSecs)])
			
			for(st_i in 1:length(stTimes)) {
	                df_timeTemp[, stTimes[st_i]] <-
	                    paste(esDfOrdReg[LinesValid,RELEVANTVN_ES[["ES_START_DATE"]]],
	                          refDf [i,RELEVANTVN_REF[["REF_ST"]][st_i]])
	            }
	
	            df_timeTemp[,RELEVANTVN_ES[["ES_END_DATETIME"]]] <-
	                esDfOrdReg[LinesValid,RELEVANTVN_ES[["ES_END_DATETIME"]]]

            # stStart0: minimum time diff when compared to the administered questionnaire version.
            # Explanation: If a day questionnaire is asked at the prompteduled morning time the time
            # difference refers to the day questionnaires not to the morning questionnaire.
            # PROMPT: minimum time diff when compared to all possible prompteduled times.
            # Explanation: Irrespective of which questionnaire version was administered the time
            # difference refers to all possible prompteduled time of the respective person.
            
            # MidnightPrompt = TRUE = 2 dates within one ESM unit is possible.
            if(midnightPrompt) {
            	time_temp <- findMin2(df_timeTemp, RELEVANTVN_ES=RELEVANTVN_ES, RELEVANTINFO_ES=RELEVANTINFO_ES)
            	# Else: One ESM unit belongs to one date, i.e. ESM unit doesn't cross dates.
            } else {
            	time_temp <- findMin1(df_timeTemp, RELEVANTVN_ES=RELEVANTVN_ES, RELEVANTINFO_ES=RELEVANTINFO_ES)
            }
            
            # qwst: questionnaire within time frame (i.e. end_time is closer to the
            # prompteduled time x than it is to the prompteduled time x + 1)
            # Example: If between 2 prompteduled questionnaires 3 hours pass, then the end
            # time of a questionnaire should not exceed 1.5 hours after the prompteduled
            # start time of the questionnaire. If it exceeds 1.5 hours, then it would be
            # closer to the next prompteduled start time.
            # ---------------------------------------------------------------------------
            qwst <- apply(time_temp, MARGIN = 1, function(x) var(x[1:2]) == 0)
            # ---------------------------------------------------------------------------
            qwstTemp <- ifelse(qwst == TRUE, 1, 0)

            if(!is.null(dstDates)) {

                # Register daylight saving time
                # -----------------------------
                dst_temp <- daylightSaving(refDf[i,RELEVANTVN_REF[["REF_START_DATE"]]], refDf[i,RELEVANTVN_REF[["REF_END_DATE"]]], dstDates)
                dstVec_temp <- rep(0, times = length(LinesValid))

                if(!all(is.na(dst_temp))) {
                    idxDST <- which(
                        lubridate::ymd(esDfOrdReg[LinesValid,RELEVANTVN_ES[["ES_START_DATE"]]]) >=
                            lubridate::ymd(dst_temp[[2]]))
                    dstVec_temp[idxDST] <- 1
                }

            } else {

                dstVec_temp <- rep(NA, times = length(LinesValid))
            }

            # TIMEFRAME (adapt time frame if necessary)
            # absStart1's units is seconds. Therefore multiply by 60 to get minutes.
            # 30 minutes = 30 * 60 seconds
            tframe_temp <- ifelse(time_temp $ absStart1 <= promptTimeframe * 60, 1, 0)

            # Index vector for prompteduled start times (used here and at the end of the function)
            stStart1_temp <- time_temp[,"PROMPT"]
            # Index vector for prompteduled end times (used only at the end of the function)
            stEnd1_temp <- time_temp[,"PROMPTEND"]

            # Overall count variable for person i. Apply function 'overallCounter'.
            cvOverall_temp <- overallCounter(stStart1_temp, esDfOrdReg[LinesValid,RELEVANTVN_ES[["ES_START_DATE"]]], RELEVANTINFO_ES=RELEVANTINFO_ES)

            # Single ES day count variable for person i. Apply function 'dayCounter'.
            day_temp <- dayCounter(esDfOrdReg[LinesValid,RELEVANTVN_ES[["ES_START_DATE"]]])

            # Check duplicated lines of data (= ES questionnaire)
            # Use overall count variable and single ES day count variable in order to detect
            # only real duplicates (multiple questionnaires per prompteduled time and PER DAY).
            esMult_temp <- ifelse((duplicated(cvOverall_temp) & duplicated(day_temp))==TRUE, 1, 0)

            # ---------------------------------------------------------------------------
            # ADAPT to number of daily ES questionnaires in 'x[1:maxNumber]'
            # -------------------------------------------
            # Vector of prompteduled times for person i.
            stVec_temp <- as.character(refDf[i , RELEVANTVN_REF[["REF_ST"]]] [stStart1_temp])
            # ---------------------------------------------------------------------------

            # CV_ESUNIT -> It must not necessarily be concordant to a unique date, e.g. the last
            # ---------	   prompts can be at or after midnight and still belong to the ES unit.
            cv_esunitDiff <- c(1, diff(time_temp$PROMPT))
            cv_esunitDiffNeg <- unique(c(which(cv_esunitDiff < 0)-1, length(time_temp$PROMPT)))
            if(all(cv_esunitDiffNeg==FALSE)) {
            		cv_esunit <- rep(1, times=length(cv_esunitDiffNeg))
            } else {
            		cv_esunitDiffStart <- c(1, (cv_esunitDiffNeg[-length(cv_esunitDiffNeg)] + 1))
            		cv_esunitDf <- data.frame(cv_esunitDiffStart, cv_esunitDiffNeg)
            		unit <- 1
            		cv_esunit <- rep(0, times=length(cv_esunitDiffNeg))
            		for(esUnit in 1:nrow(cv_esunitDf)) {
            			cv_esunit[cv_esunitDf[esUnit,1]:cv_esunitDf[esUnit,2]] <- unit
            			unit <- unit + 1
            		}
            }

            esOptDf_temp <- esOptimum(
                person_i, refDf, RELEVANTVN_REF[["REF_ID"]],
                RELEVANTVN_REF[["REF_START_DATE"]], RELEVANTVN_REF[["REF_START_TIME"]],
                RELEVANTVN_REF[["REF_END_DATE"]], RELEVANTVN_REF[["REF_END_TIME"]],
                RELEVANTINFO_ES[["MAXPROMPT"]], RELEVANTVN_REF[["REF_ST"]])
            names(esOptDf_temp)

            # Compare actual ES week with optimum ES week of person i:
            # -------------------------------------------------------

            # BEWARE: At every prompteduled time point at least one questionnaire
            #		  must have been filled out during the ES week! If that
            #		  is not the case then 'actual_i' contains less values
            #		  than 'optim_i'. -> Problem has been solved by turning
            #		  stStart1_temp into a factor with levels 1:MAXPROMPT.

            # duplOut: Repeated ES questionnaires (per prompteduled time) must
            #		   be removed first.
            duplOut <- which(esMult_temp == 1)

            # -------------------------------------------
            if(is.integer0(duplOut)) {
                # No repeated ES questionnaires in person i.
                stStart1_tempFac <-
                    factor(stStart1_temp, levels = c(1:RELEVANTINFO_ES[["MAXPROMPT"]]))
            } else {
                # At least one repeated questionnaire in person i.
                stStart1_tempFac <-
                    factor(stStart1_temp[-duplOut], levels = c(1:RELEVANTINFO_ES[["MAXPROMPT"]]))
            }
            # ---------------------------------------------------------------------------

            tblActual_i <- table(stStart1_tempFac)
            actual_i <- as.numeric(tblActual_i)

            tblOptim_i <- table(esOptDf_temp[,"PROMPT"])
            optim_i <- as.numeric(tblOptim_i)
            
            # Efficiency for person i (ES commitment for person i); not rounded.
            efficiency_temp0 <- actual_i / optim_i * 100

            # Efficiency for person i (ES commitment for person i); rounded.
            efficiency_temp <- data.frame(rbind(round(efficiency_temp0, digits = 2)))
			
			avrgCompletionRate <- c(avrgCompletionRate, efficiency_temp0, mean(efficiency_temp0))
			
            # ---------------------------------------------------------------------------
            # ADAPT to number of daily ES questionnaires in '<- 1:maxNumber'
            # -------------------------------------------
            colnames(efficiency_temp) <- 1:RELEVANTINFO_ES[["MAXPROMPT"]]
            rownames(efficiency_temp) <- "%"
            # ---------------------------------------------------------------------------

            cat("Event sampling period - completion rates per prompt\n")
            print(efficiency_temp)
            cat("------------------------------------------------\n\n")
            # # ---------------------------------------------------------------------

            # ---------------------------------------------------------------------
            # Collect:
            # -------
            ID <- c(ID, rep(as.character(refDf[i,RELEVANTVN_REF[["REF_ID"]]]), times=length(LinesValid)))

            Lines <- c(Lines, LinesValid)

            CV_ES <- c(CV_ES, cvOverall_temp)

            if(midnightPrompt) {
				CV_ESDAY <- c(CV_ESDAY, cv_esunit)
			} else {
				CV_ESDAY <- c(CV_ESDAY, day_temp $ esDay)
			}

            CV_ESWEEKDAY <- c(CV_ESWEEKDAY, day_temp $ weekDay)

            ES_MULT <- c(ES_MULT, esMult_temp)

            PROMPT <- c(PROMPT, stStart1_temp)

            PROMPTEND <- c(PROMPTEND, stEnd1_temp)

            ST <- c(ST, stVec_temp)
            
            if(midnightPrompt == TRUE) {
            	STDATE <- c(STDATE, time_temp $ midnightDate)
            }
            
            # startLag with positive and negative values, depending on whether
            # actual start was prior to or after the scheduled time.
            startLag_temp <- with(time_temp, ifelse(lag_ba0 == 0, -absStart1, absStart1))
            LAG_MINS <- c(LAG_MINS, round(startLag_temp/60, digits=0))

            TFRAME <- c(TFRAME, tframe_temp)

            DST <- c(DST, dstVec_temp)

            QWST <- c(QWST, qwstTemp)
            
            esOptDf <- rbind(esOptDf, esOptDf_temp)

        # Else all ES questionnaires are event contingent. Therefore no time differences can
        # be computed as there are no prompteduled times.
        } else {

            # Single ES day count variable for person i. Apply function 'dayCounter'.
            day_temp <- dayCounter(esDfOrdReg[LinesValid,RELEVANTVN_ES[["ES_START_DATE"]]])

            # Collect:
            # -------
            ID <- c(ID, rep(as.character(refDf[i,RELEVANTVN_REF[["REF_ID"]]]), times=length(LinesValid)))

            Lines <- c(Lines, LinesValid)
            
            CV_ES <- c(CV_ES, 1:length(LinesValid))

            CV_ESDAY <- c(CV_ESDAY, day_temp $ esDay)

            CV_ESWEEKDAY <- c(CV_ESWEEKDAY, day_temp $ weekDay)
        }
    }

    if(any(is.na(match(refDf[,RELEVANTVN_REF[["REF_ID"]]], unique(ID)))) & assignAll==TRUE) {
        warning("Some person IDs couldn't be assigned. Is the current ES dataset most up to date and/or are the prompteduled dates of ALL persons correct?")
        idx_notAssigned <- which(is.na(match(refDf[,RELEVANTVN_REF[["REF_ID"]]], unique(ID))))
        cat("The following person(s) couldn't be assigned to the current ES dataset:\n")
        print(refDf[idx_notAssigned,RELEVANTVN_REF[["REF_ID"]]])
    }

    # After all persons in 'refDf' have been checked, collect and return the result:
    # ------------------------------------------------------------------------------

    if(prompted == TRUE) {

        # ---------------------------------------------------------------------
        
        if(midnightPrompt == TRUE) {
        	esDfOut <- data.frame(ID, CV_ES, CV_ESDAY, CV_ESWEEKDAY, esDfOrd [Lines,], PROMPT, PROMPTEND, ES_MULT, ST, STDATE, LAG_MINS, TFRAME, DST, QWST)
        } else {
        	esDfOut <- data.frame(ID, CV_ES, CV_ESDAY, CV_ESWEEKDAY, esDfOrd [Lines,], PROMPT, PROMPTEND, ES_MULT, ST, LAG_MINS, TFRAME, DST, QWST)
        }
		
		# Function cumsumReset generates variable ES_MULT2
		esDfOut1 <- cumsumReset(esDfOut, "ES_MULT")

        # If lines weren't assigned then variable 'lnna' contains them.
        # -------------------------
        vecES <- seq(1, nrow(esDfOrdReg))
        vecLines <- sort(Lines)
        idx_vec <- !(vecES %in% vecLines)
        lnna <- vecES [idx_vec]		# lnna: Line numbers not assigned

        ESrate <- data.frame(ID=unique(ID), matrix(avrgCompletionRate, nrow=length(unique(ID)), byrow=TRUE))
        colnames(ESrate)[2:ncol(ESrate)] <- c(paste0("PROMPT", 1:RELEVANTINFO_ES[["MAXPROMPT"]]), "MEAN")
        
        cat("!-!-!-!-!-!-!-!-!-!-!-!-!-!-!-!-!-!-!-!-!-!-!-!-!-!-!-!-!-!-!-!-!-!-!-!-!\n")
        cat("Output dataset no.4 (ESrate) is preliminary, since some of the current ESM questionnaires later might either be removed (see function intolerable) or be shifted to a neighboring prompt index (see functions suggestShift and makeShift).\n")
        cat("!-!-!-!-!-!-!-!-!-!-!-!-!-!-!-!-!-!-!-!-!-!-!-!-!-!-!-!-!-!-!-!-!-!-!-!-!\n")
        
        
        # Return a list with 4 data frames:
        list(ES = esDfOut1, ESopt = esOptDf, ESout = esDfOrd [lnna, ], ESrate = ESrate)
        # ---------------------------------------------------------------------

    } else {

        # ---------------------------------------------------------------------
        # B E W A R E: When there is not at least one prompteduled time then PROMPT, PROMPTEND, esMult, ST, TFRAME can't/won't be computed.

        esDfOut <- data.frame(ID, CV_ES, CV_ESDAY, CV_ESWEEKDAY, esDfOrd[Lines,])

        # If lines weren't assigned then variable 'lnna' contains them.
        # -------------------------
        vecES <- seq(1, nrow(esDfOrdReg))
        vecLines <- sort(Lines)
        idx_vec <- !(vecES %in% vecLines)
        lnna <- vecES[idx_vec]		# lnna: Line numbers not assigned

        # Return a list with 2 data frames:
        # --------------------------------
        list(ES = esDfOut, ESout = esDfOrd [lnna, ])
    }
}

Try the esmprep package in your browser

Any scripts or data that you put into this service are public.

esmprep documentation built on July 5, 2019, 5:03 p.m.