get.adjust: Filter Study Data Set to Variables of Interest

Description Usage Arguments Details Value CSV Formatting Examples

View source: R/load-data.R View source: R/load-data copy.R

Description

This function filters and creates new variables from the study data. This function takes csv file inputs that detail the new variable names and formulas created from the variable names in the Stata file.

Usage

1
get.adjust(stata, mods, filts = NULL)

Arguments

stata

Study (ARIC or HUNT) dataset in stata format

mods

List of paths to formatted csv files creating new adjustment variables. Format of csv is detailed below. Files should be titled adjusted.csv and the other outcomes.csv. This parameter is required

filts

List of path (singular) to formatted csv of the data filters to be applied to the dataset

Details

This function uses csv files to inject commands into dplyr::mutate() and dplyr::filter() functions. The format of the csv file to properly execute the desired commands is detailed below. The mods parameter is required and must be a list of path(s) to the csv files. Convention is to include one file with the outcome variables–including heart failure diagnosis variables and time to event variables and another file including adjustment variables, such as demographics and clinical history. The filter csv should include filtering conditions, such as excluding patients with prevalent heart failure, and the format of that csv is also detailed below.

Value

Tibble with data from all variables found in the mods csv files.

CSV Formatting

Adjustment and Outcome Variables

Files must include a header for three columns–Variable, Expression, Label. Then, a new row must be created for each new variable, its expression, and its label. Expressions are R code snippets that calculate variables. Data in tibble should be assumed to be attached–same as when tidyverse formats its variables.

!! First row in adjustment variables must always be the study identifier for merging purposes. !!

Examples:

Adjustment Variables:
Variable Expression Label
id id ARIC COHORT STUDY ID
age v5age51 Visit 5 Age
bmi v5_bmi51 Visit 5 BMI
race as.factor(race == 1) Subject Race (Black == 1)

This table is coded in csv as:

Variable, Expression, Label
id, id, ARIC COHORT STUDY ID
age, v5age51, Visit 5 Age
bmi, v5_bmi51, Visit 5 BMI,
race, as.factor(race == 1), Subject Race (Black == 1)

Outcomes Variables:
Variable Expression Label
hfdiag !is.na(adjudhf_bwh) Incident Heart Failure Dx
fuptime as.double(adjudhfdate - v5date51) V5 HF Follow Up Time

This table is coded in csv as:

Variable, Expression, Label
hfdiag, !is.na(adjudhf_bwh), Incident Heart Failure Dx
fuptime, as.double(adjudhfdate - v5date51), V5 HF Follow Up Time

!! The First Row Must be the primary outcome, and the last row must be the primary time to event variable. This quirk may be patched in the coming versions.

Filters CSV Format:

These files filter data according to exclusion criteria. Most used to remove subjects with prevalent heart failure.

Data Filters and Exclusions:

Files must include headers for three columns–Variable, Operator, Expression. These headers represent the new variable, its filtering operation (==, >, >=, <, <=, etc.), and the expression for this operation respectively.

Examples:

Variable Operator Expression
v5_prevhf52 == FALSE
fuptime > 0

This table is coded in csv as:

Variable, Operator, Expression
v5_prevhf52, ==, FALSE
fuptime, >, 0

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
## Not run: 
fifth.visit.study <-
  get.adjust(stata = haven::read_dta('data/ARICmaster_121820.dta'),
             mods  = list('data/visit-five/outcomes.csv',
                          'data/visit-five/adjusted.csv'),
             filts = list('data/visit-five/filters.csv'))

fifth.visit.echo <-
  get.adjust(stata = haven::read_dta('data/ARICmaster_121820.dta'),
             mods  = list('data/visit-five/echovars.csv'))

## End(Not run)

pranavdorbala/proteomicsHF documentation built on March 9, 2021, 12:22 a.m.