summarise_numerical_variables: Summarise numerical variables

View source: R/rowwise_summaries.R

summarise_numerical_variablesR Documentation

Summarise numerical variables

Description

Summarises numerical variables with repeated measurements either by field (i.e. all available measurements) or by instance (i.e. for all measurements at each assessment visit). Currently available summary options are mean, minimum, maximum, sum and number of non-missing values.

Usage

summarise_numerical_variables(
  ukb_main,
  data_dict = NULL,
  ukb_data_dict = get_ukb_data_dict(),
  summary_function = "mean",
  summarise_by = "Field",
  .drop = FALSE
)

Arguments

ukb_main

A UK Biobank main dataset data frame. Column names must match those under the descriptive_colnames column in data_dict.

data_dict

a data dictionary specific to the UKB main dataset file, created by make_data_dict.

ukb_data_dict

The UKB data dictionary (available online at the UK Biobank data showcase. This should be a data frame where all columns are of type character.

summary_function

The summary function to be applied. Options: "mean", "min", "max", "sum" or "n_values"

summarise_by

Whether to summarise by "Field" or by "Instance".

.drop

If TRUE, removes the original numerical variables from the result. Default value is FALSE.

Details

Note that when summary_function = "sum", missing values are converted to zero. Therefore if a set of values are all missing then the sum will summarised as 0. See the documentation for rowSums for further details.

Value

A data frame with new columns summarising numerical variables. The names for these new columns are prefixed by the value for summary_function and end with 'x', FieldID +/- instance being summarised e.g. if summarising FieldID 4080 instance 0, the new column would be named 'mean_systolic_blood_pressure_automated_reading_x4080_0'.

Examples

library(magrittr)
# get dummy UKB data and data dictionary
dummy_ukb_data_dict <- get_ukb_dummy("dummy_Data_Dictionary_Showcase.tsv")
dummy_ukb_codings <- get_ukb_dummy("dummy_Codings.tsv")

dummy_ukb_main <- read_ukb(
  path = get_ukb_dummy("dummy_ukb_main.tsv", path_only = TRUE),
  ukb_data_dict = dummy_ukb_data_dict,
  ukb_codings = dummy_ukb_codings
) %>%
  dplyr::select(eid, tidyselect::contains("systolic_blood_pressure")) %>%
  tibble::as_tibble()

# summarise mean values by Field, keep original variables
summarise_numerical_variables(
  dummy_ukb_main,
  ukb_data_dict = dummy_ukb_data_dict
)

# summarise mean values by Field, drop original variables
summarise_numerical_variables(
  dummy_ukb_main,
  ukb_data_dict = dummy_ukb_data_dict,
  .drop = TRUE
)

# summarise min values by instance, dropping original variables
summarise_numerical_variables(
  dummy_ukb_main,
  ukb_data_dict = dummy_ukb_data_dict,
  summary_function = "min",
  summarise_by = "Instance",
  .drop = TRUE
)

rmgpanw/ukbwranglr documentation built on April 30, 2024, 7:47 a.m.