README.md

farrago

{farrago} is an R package serving as a collection of tools for data workflows and analysis, with focus on health surveillance data. Although {farrago} primarily serves as a personal collection of odds-and-ends picked-up or created over the past several years, it may assist wider audiences as well. The package is organized by general purpose/functionality, which may eventually be separated into discrete packages.

Installation

{farrago} is only available from GitHub and the latest version can be installed with:

# install.packages("devtools")
devtools::install_github("al-obrien/farrago")

Use-cases

{farrago} has a variety of functions available. They are roughly organized into the following categories:

  1. Calculations: provides algorithms for some routine processes such as…

    • Determining pregnancy trimesters

    • Assigning episode periods (e.g. for repeat infections)

    • Collapsing time-steps

    • Determining overlaps in time

    • Basic metrics such as rates, max/min, etc.

  2. Conversions: helper functions to convert between common formats in epidemiology

    • Replace all blank values to NA (e.g. when importing data from SAS)

    • Switch between flu and calendar weeks

    • Determine flu season from date

    • Quickly convert a table to image (png)

    • Basic conversions such as from numbers to percent, number to factor, etc.

  3. Creation: generate new content

    • Make multi-level factors similar to SAS ‘multi-label’ functionality

    • Create hypercubes (i.e. n-dimensional table including group summaries and totals)

    • Determine break points from set of values

  4. Transferal: methods to move objects and data

    • Easily stow() and retrieve() data-sets to make efficient use of RAM

    • File transfer using WinSCP wrapper

    • Locate files

    • Pass code and retrieve data from SAS (primarily for use with Classic 9.4)

  5. Plotting: helper functions for shared legends and less common plots such as bulls-eye charts and X-splines

  6. Miscellaneous

Example

This is a basic example using a sub-set of functions from {farrago}…

# Load libraries
library(farrago)
library(magrittr)
library(dplyr)
library(lubridate)
# Download from configured SFTP location
transfer_winscp(file ='my_rmt_file.csv'),
               direction = 'download',
               connection = 'sftp://myusername:mypwd@hostlocation.ca/'
               rmt_path = './location/',
               drop_location = 'C:/PATH/TO/DESIRED/FOLDER/')
# Non-sense data for example
my_rmt_file <- tibble::tribble(~grp_id, ~date, ~date_of_birth, ~condition, ~date_of_birth_child, 
                               1, '2020-01-01', '1970-06-04', 'alive', '1991-01-01',
                               1, '2020-01-01', '1980-04-05', '', '1990-02-04',
                               1, '2020-01-03', '1930-04-05', 'alive', '',
                               1, '2020-01-04', '1967-04-05', 'alive', '1998-01-21',
                               2, '2020-01-01', '1978-04-05', 'alive', '1998-06-21',
                               2, '2020-09-10', '1970-04-05', 'alive', '1992-09-13',
                               2, '2020-09-21', '1949-04-05', 'dead', '1987-01-03',
                               3, '2020-01-01', '1977-04-05', '', '1992-01-21',
                               3, '2020-01-02', '1944-04-05', 'alive', '',
                               3, '2020-01-21', '1943-06-05', 'alive', '1967-09-12',
                               3, '2020-01-22', '1969-07-05', 'alive', '2006-12-21',
                               3, '2020-04-22', '', NA, NA,
                               3, '2021-06-09', '1978-09-21', 'dead', '1992-01-21') %>%
  dplyr::mutate_at(vars(contains('date')), ymd)

# Remove blanks
my_rmt_file <- convert_blank2NA(my_rmt_file)

# Determine episode period based on first date by group
my_rmt_file$episode <- assign_episode(data = my_rmt_file,
                                      grp_id = grp_id,
                                      date = date,
                                      threshold = 10)

# Determine age and age group from date
my_rmt_file$age <- calculate_age(my_rmt_file$date_of_birth)
my_rmt_file$age_grp <- create_breaks(my_rmt_file$age, breaks = c(0, 10, 20, 30, 40, 50, 60, 70, 80, 90), format = TRUE)

# Calculate trimester based on dob (of child)
my_rmt_file <- calculate_trimesters(my_rmt_file, date_of_birth_child)
#> Warning in calculate_trimesters(my_rmt_file, date_of_birth_child): No variable
#> for gestation length was provided, all pregnancies will assume the average
#> pregnancy length of: 40

# View final dataset
knitr::kable(my_rmt_file)

| grp_id | date | date_of_birth | condition | date_of_birth_child | episode | age | age_grp | tri1_s | tri1_e | tri2_s | tri2_e | tri3_s | preterm | |--------:|:-----------|:----------------|:----------|:-----------------------|--------:|----:|:---------|:-----------|:-----------|:-----------|:-----------|:-----------|--------:| | 1 | 2020-01-01 | 1970-06-04 | alive | 1991-01-01 | 1 | 51 | 50-59 | 1990-03-27 | 1990-06-26 | 1990-06-27 | 1990-09-26 | 1990-09-27 | 0 | | 1 | 2020-01-01 | 1980-04-05 | NA | 1990-02-04 | 1 | 41 | 40-49 | 1989-04-30 | 1989-07-30 | 1989-07-31 | 1989-10-30 | 1989-10-31 | 0 | | 1 | 2020-01-03 | 1930-04-05 | alive | NA | 1 | 91 | >=90 | NA | NA | NA | NA | NA | NA | | 1 | 2020-01-04 | 1967-04-05 | alive | 1998-01-21 | 1 | 54 | 50-59 | 1997-04-16 | 1997-07-16 | 1997-07-17 | 1997-10-16 | 1997-10-17 | 0 | | 2 | 2020-01-01 | 1978-04-05 | alive | 1998-06-21 | 1 | 43 | 40-49 | 1997-09-14 | 1997-12-14 | 1997-12-15 | 1998-03-16 | 1998-03-17 | 0 | | 2 | 2020-09-10 | 1970-04-05 | alive | 1992-09-13 | 2 | 51 | 50-59 | 1991-12-08 | 1992-03-08 | 1992-03-09 | 1992-06-08 | 1992-06-09 | 0 | | 2 | 2020-09-21 | 1949-04-05 | dead | 1987-01-03 | 3 | 72 | 70-79 | 1986-03-29 | 1986-06-28 | 1986-06-29 | 1986-09-28 | 1986-09-29 | 0 | | 3 | 2020-01-01 | 1977-04-05 | NA | 1992-01-21 | 1 | 44 | 40-49 | 1991-04-16 | 1991-07-16 | 1991-07-17 | 1991-10-16 | 1991-10-17 | 0 | | 3 | 2020-01-02 | 1944-04-05 | alive | NA | 1 | 77 | 70-79 | NA | NA | NA | NA | NA | NA | | 3 | 2020-01-21 | 1943-06-05 | alive | 1967-09-12 | 2 | 78 | 70-79 | 1966-12-06 | 1967-03-07 | 1967-03-08 | 1967-06-07 | 1967-06-08 | 0 | | 3 | 2020-01-22 | 1969-07-05 | alive | 2006-12-21 | 2 | 52 | 50-59 | 2006-03-16 | 2006-06-15 | 2006-06-16 | 2006-09-15 | 2006-09-16 | 0 | | 3 | 2020-04-22 | NA | NA | NA | 3 | NA | NA | NA | NA | NA | NA | NA | NA | | 3 | 2021-06-09 | 1978-09-21 | dead | 1992-01-21 | 4 | 43 | 40-49 | 1991-04-16 | 1991-07-16 | 1991-07-17 | 1991-10-16 | 1991-10-17 | 0 |

# Save file for easy retrieval later
my_rmt_file_stowed <- stow(my_rmt_file, cleanup = TRUE)


al-obrien/farrago documentation built on April 14, 2023, 6:20 p.m.