R-ukbiobank: UK Biobank Utilities

#' @name ukbiobank
#' @title ukbiobank
#' @description This package contains a collection of tools and functions to facilitate analysis of UK Biobank phenotype data.
#' @section Instancing: The UK Biobank records visits as separate "instances." As of this writing, there are 4 instances labeled 0 through 3. At each instance, various information can be recorded or re-recorded. For example, blood pressure is typically recorded at most in-person evaluations. Therefore, there may be 4 separate columns for blood pressure recordings (actually, there could be more because the blood pressure may be recorded multiple times at each instance). Almost all of the functions in this package will utilize instance numbers to specify from which time pointsdata should be retrieved. For example, we may want to know the state of ICD10 diagnoses *before* instance 2. In this case, we would specify `up_to_instance = 1` (search up to instance 1, inclusive) in functions that take this as an argument.
#'
#' Arguments like `up_to_instance` and `after_instance` can take a constant instance number as their value. But they can also take the name of a column that contains an instance number, so that different instance limits can be used for each participant. For example, some participants undergo MRI at instance 2, and others at instance 3. If we want to know the state of a diagnosis up to and including the time of MRI, we would want to assign `up_to_instance` to the name of the column that specifies which instance the MRI occurred at. This column typically has to be generated by the user and attached to thedata frame beforehand.
#'
#' @section Column names: Most functions in this package will take column names (`*_col`) as optional arguments (otherwise a default column names are used) which are then used as templates to find all other columns with the same field number, but different instance and array numbers. These functions will automatically find all matching instances (and arrays within each instance) within the specified parameters. Internally, a set of `expand_instance_*()` helper functions, which themselves rely on the `column_expansion_helper()` function, perform the work of searching for matching columns.
#'
#' @param data The primarydata frame.
#'
#' Thisdata frame includes all necessary columns required to perform look-ups (e.g. ICD10 code columns, medication code columns, age, sex, etc.).
#'
#' @param instance_num An integer specifying the instance number, or the name of the column (using <[`data-masking`][dplyr_data_masking]> rules) containing instance numbers to use in the current function.
#' @param after_instance An integer specifying an instance number, or the name of the column (using <[`data-masking`][dplyr_data_masking]> rules) containing instance numbers to use as the **minimum instance (non-inclusive)**. Used to include alldata **after but NOT including** the specified instance number. Defaults to `default_after_inst()`, which is typically -1 (i.e., include instances 0 and later).
#' @param up_to_instance An integer specifying an instance number, or the name of the column (using <[`data-masking`][dplyr_data_masking]> rules) containing instance numbers to use as the **maximum instance (inclusive)**. Used to include alldata **up to and including** the specified instance number. Defaults to `default_up_to_inst()`, which as of this writing returns `3`.
#' @param combine_instances The method used by [instance_combiner()] or other Reduce-like functions when combining results of a lookup (e.g. a medication lookup, icd10 lookup, biomarker lookup) across multiple instances.
#'
#' For example, when looking up whether a participant is on a medication, the result may differ depending on the instance number. In such a case, one would want to apply the `"any"` method so that results for instance 2 will be [Reduce()]-ed with the `or` operator applied to the results of instances 0 and 1. In the case of numeric lookups (e.g. biomarkers that are recorded at multiple instances), one might want to use the `"mean"` method to average results across instances.
#'
#' Can be one of:
#' * `"any"` - use Boolean `or` (note: requires that lookup results are `logical`)
#' * `"min"` - use the minimum non-NA value
#' * `"max"` - use the maximum non-NA value
#' * `"first"` - use the first/earliest non-NA value
#' * `"last"` - use the last/latest non-NA value
#' * `"mean"` - use the mean of non-NA values
#'
#' Functions that call `combine_instances()` may restrict the choice of options (e.g. it doesn't make sense to apply `"any"` to numeric data).
#' @param combine_array If a measurement field has multiple array values (e.g. blood pressure recordings are made in duplicate), specify how these values should be combined. See `combine_instances` for details and options.
#' @param date Target date, as a string which is then passed to [lubridate::as_date()].
#' @param death_date_col  Template column name for date of death.\cr Default = `f.40000.0.0.Date_of_death`.
#' @param age_at_instance_col  Template column name for age at instance.\cr Default = `f.21003.0.0.Age_when_attended_assessment_centre`.
#' @param date_of_instance_col  Template column name for date of each instance.\cr Default = `f.53.0.0.Date_of_attending_assessment_centre`.
#' @param year_of_birth_col  Template column name for year of birth.\cr Default = `f.34.0.0.Year_of_birth`.
#' @param month_of_birth_col  Template column name for month of birth.\cr Default = `f.52.0.0.Month_of_birth`.
#' @param diagnosis_col  Template column name for self-reported diagnosis codes.\cr Default = `f.20002.0.0.Non_cancer_illness_code_self_reported`.
#' @param medication_col  Template column name for self-reported medication codes.\cr Default = `f.20003.0.0.Treatment_medication_code`.
#' @param icd10_col  Template column name for ICD10 codes.\cr Default = `f.41270.0.0.Diagnoses_ICD10`.
#' @param icd10_date_col  Template column name for dates of ICD10 diagnoses.\cr Default = `f.41280.0.0.Date_of_first_in_patient_diagnosis_ICD10`.
#' @param ethnicity_col  Template column name for ethnic background.\cr Default = `f.21000.0.0.Ethnic_background`.
#' @param measurement_col Template column name for general measurements.\cr Example = `f.4080.0.0.Systolic_blood_pressure_automated_reading`.
#' @param measurement_col_alt Alternate template column name, used specifically for manual readings of measurements when automated methods return NA.
#' @param height_col Template column name for height.\cr Default = `f.50.0.0.Standing_height`.
#' @param weight_col Template column name for weight.\cr Default = `f.21002.0.0.Weight`.
#'
#' @import dplyr
#' @import rlang
#' @importFrom magrittr %>% %<>%
#'
#' @docType package
#' @keywords internal
NULL