\hypersetup{colorlinks=true, linkcolor=blue}
\listoftables
\clearpage
knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) options( xtable.comment = FALSE, datatable.verbose = FALSE, scipen = 10, knitr.kable.NA = '', knitr.table.format = 'latex', kableExtra.latex.load_packages = FALSE )
VISCfunctions is a collection of useful functions designed to assist in the analysis and creation of professional reports. The current VISCfunctions package can be broken down to the following sections:
Code to initially install the VISCfunctions package:
# Installing VISCfunctions from GitHub remotes::install_github("FredHutch/VISCfunctions")
Code to load in VISCfunctions and start using:
# Loading VISCfunctions library(VISCfunctions) # Loading dplyr for pipe library(dplyr)
\clearpage
The VISCtemplates package makes
extensive use of the VISCfunctions
package, and is a great way get started
making professional statistical reports.
Code to initially download VISCtemplates package:
# Installing VISCfunctions from GitHub remotes::install_github("FredHutch/VISCtemplates")
Once installed, in RStudio go to File -> New File -> R Markdown -> From
Template -> VISC PT Report (Generic) to start a new Markdown report using the
template. Within the template there is code to load and make use of most of the
VISCfunctions
functionality.
The exampleData_BAMA
, exampleData_NAb
, exampleData_ICS
, and CAVD812_mAB
datasets are assay example datasets used throughout this vignette and most
examples in the VISCfunctions
documentation. All example datasets have
associated documentation that can be viewed using ?
(i.e ?exampleData_BAMA
).
# Loading in example datasets data("exampleData_BAMA", "exampleData_NAb", "exampleData_ICS", "CAVD812_mAB") # Quick view of dataset tibble::glimpse(exampleData_BAMA)
# Loading in example datasets exampleData_BAMA <- exampleData_BAMA %>% mutate(antigen = stringr::str_replace_all(antigen, '_', ' '))
There are currently three testing functions performing the appropriate statistical tests, depending on the data and options, returning a p value.
There is also an estimate function for getting binary confidence intervals
(binom_ci()
).
trapz_sorted()
is used to create an estimate for the area under a curve while
making sure the x-axis is sorted so that both the x-axis and the area are
increasing and positive, respectively.
# Making Testing Dataset testing_data <- exampleData_BAMA %>% dplyr::filter(antigen == 'A1.con.env03 140 CF' & visitno == 1)
\clearpage
two_samp_bin_test()
is used for comparing a binary variable to a binary (two
group) variable, with options for Barnard, Fisher's Exact, Chi-Sq, and McNemar
tests. For the Barnard test specifically, there are many model options that can
be set that get passed to the Exact::exact.test()
function.
table(testing_data$response, testing_data$group) # Barnard Method two_samp_bin_test( x = testing_data$response, y = testing_data$group, method = 'barnard', alternative = 'two.sided' ) # Santner and Snell Variation two_samp_bin_test( x = testing_data$response, y = testing_data$group, method = 'barnard', barnard_method = 'santner and snell', alternative = 'two.sided') # Calling test multiple times exampleData_BAMA %>% group_by(antigen, visitno) %>% filter(visitno != 0) %>% summarise(p_val = two_samp_bin_test(response, group, method = 'barnard'))
\clearpage
two_samp_cont_test()
is used for comparing a continuous variable to a binary
(two group) variable, with parametric (t.test) and non-parametric (Wilcox
Rank-Sum) options. Also paired data is allowed, where there are parametric
(paired t.test) and non-parametric (Wilcox Signed-Rank) options.
by(testing_data$magnitude, testing_data$group, summary) two_samp_cont_test( x = testing_data$magnitude, y = testing_data$group, method = 'wilcox', alternative = 'two.sided' ) # Calling test multiple times exampleData_BAMA %>% group_by(antigen, visitno) %>% filter(visitno != 0) %>% summarise(p_val = two_samp_cont_test(magnitude, group, method = 'wilcox'))
\clearpage
cor_test()
is used for comparing two continuous variables, with Pearson,
Kendall, or Spearman methods.
If Spearman method is chosen and either variable has a tie, the approximate
distribution is use in the coin::spreaman_test()
function. This is usually the
preferred method over the asymptotic approximation, which is the method
stats:cor.test()
uses in cases of ties. If the Spearman method is chosen and
there are no ties, the exact method is used from stats:cor.test
.
# Making Testing Dataset cor_testing_data <- exampleData_BAMA %>% dplyr::filter(antigen %in% c('A1.con.env03 140 CF', 'B.MN V3 gp70'), visitno == 1, group == 1) %>% tidyr::pivot_wider(id_cols = pubID, names_from = antigen, values_from = magnitude) stats::cor(cor_testing_data$`A1.con.env03 140 CF`, cor_testing_data$`B.MN V3 gp70`, method = 'spearman') cor_test( x = cor_testing_data$`A1.con.env03 140 CF`, y = cor_testing_data$`B.MN V3 gp70`, method = 'spearman' )
trapz_sorted()
is used when you are not presorting your data so that it is
strictly increasing along the x-axis or when there are 'NA' values that are not
removed prior to estimating the area under the curve.
set.seed(93) n <- 10 # unsorted data x <- sample(1:n, n, replace = FALSE) y <- runif(n, 0, 42) pracma::trapz(x, y) trapz_sorted(x,y) # NA in data x <- c(1:6, NA, 8:10) y[2] <- NA pracma::trapz(x, y) trapz_sorted(x,y, na.rm = TRUE)
\clearpage
These are functions designed to produce professional output that can easily be printed in reports.
pretty_pvalues()
is used on p-values by rounding them to a specified number of
digits and using < for low p-values as opposed to scientific notation (i.e., "p
< 0.0001" if rounding to 4 digits), and by allowing options for emphasizing
p-values and specific characters for missing values.
pvalue_example = c(1, 0.06753, 0.004435, NA, 1e-16, 0.563533) # For simple p value display pretty_pvalues(pvalue_example, digits = 3, output_type = 'no_markup') # For display in report table_p_Values <- pretty_pvalues(pvalue_example, digits = 3, background = "yellow") kableExtra::kable( table_p_Values, escape = FALSE, linesep = "", col.names = c("P-values"), caption = 'Fancy P Values' ) %>% kableExtra::kable_styling(font_size = 8.5, latex_options = "hold_position")
You can also specify if you want p=
pasted on the front of the p values, using
the include_p
parameter.
\clearpage
stat_paste()
is used to combine two or three statistics together, allowing
for different rounding and bound character specifications. Common uses for this
function are for:
# Simple Examples stat_paste(stat1 = 2.45, stat2 = 0.214, stat3 = 55.3, digits = 2, bound_char = '[') stat_paste(stat1 = 6.4864, stat2 = pretty_pvalues(0.0004, digits = 3), digits = 3, bound_char = '(') exampleData_BAMA %>% filter(visitno == 1) %>% group_by(antigen, group) %>% summarise(`Magnitude Info (Median [Range])` = stat_paste(stat1 = median(magnitude), stat2 = min(magnitude), stat3 = max(magnitude), digits = 2, bound_char = '['), `Magnitude Info (Mean (SD))` = stat_paste(stat1 = mean(magnitude), stat2 = sd(magnitude), digits = 2, bound_char = '('))
\clearpage
paste_tbl_grp()
pastes together information, often statistics, from two groups.
There are two predefined combinations: mean(sd) and median[min,max], but user
may also paste any single measure together.
Example of summary information to be pasted together (partial output):
summary_info <- exampleData_BAMA %>% filter(visitno == 1) %>% mutate(group = paste0("Group", group)) %>% group_by(antigen, group) %>% summarise_at("magnitude", list( n = length, mean = mean, sd = sd, median = median, min = min, max = max )) %>% tidyr::pivot_longer(cols = n:max) %>% tidyr::unite(var, group, name) %>% tidyr::pivot_wider(names_from = var, values_from = value) %>% mutate(Group1 = "Group 1", Group2 = "Group 2") summary_info %>% select(antigen, Group1_max, Group1_mean, Group1_median, Group1_min, Group1_n)
summary_table <- summary_info %>% paste_tbl_grp( vars_to_paste = c('n', 'mean_sd', 'median_min_max'), first_name = 'Group1', second_name = 'Group2' ) kableExtra::kable( summary_table, booktabs = TRUE, linesep = "", caption = 'Summary Information Comparison' ) %>% kableExtra::kable_styling(font_size = 6.5, latex_options = "hold_position") %>% kableExtra::footnote('Summary Information for Group 1 vs. Group 2, by Antigen and for Visit 1')
\clearpage
escape()
protects control characters in a string for use in a latex table or
caption.
value_example <- c("testvalue", "test_value", "ampersand&") escape(value_example) escape("String_Entry %")
collapse_group_row()
is for replacing repeated rows in a data.frame into NA
for nice printing. This is an alternative to kableExtra::collapse_rows()
,
which does not work well for tables that span multiple pages (i.e. longtable =
TRUE
).
set.seed(341235432) sample_df <- data.frame( x = c(1, 1, 1, 2, 2, 2, 2), y = c('test1', 'test1', 'test2', 'test1', 'test2', 'test2', 'test1'), z = c(1, 2, 3, 4, 5, 6, 7), outputVal = runif(7) ) collapse_group_row(sample_df, x, y, z) %>% kableExtra::kable( booktabs = TRUE, linesep = "", row.names = FALSE ) %>% kableExtra::row_spec(3, hline_after = TRUE) %>% kableExtra::kable_styling(font_size = 8, latex_options = "hold_position")
\clearpage
pairwise_test_bin()
, pairwise_test_cont()
, cor_test_pairs()
are
functions that perform pairwise group (or any categorical variable) comparisons.
The function will go through all pairwise comparisons and output descriptive
statistics and relevant p values, calling the appropriate testing functions from
the Testing and Estimate Functions section.
Simple example using pairwise_test_bin
:
set.seed(1) x_example <- c(NA, sample(0:1, 50, replace = TRUE, prob = c(.75, .25)), sample(0:1, 50, replace = TRUE, prob = c(.25, .75))) group_example <- c(rep(1, 25), NA, rep(2, 25), rep(3, 25), rep(4, 25)) pairwise_test_bin(x_example, group_example) %>% rename(Separation = 'PerfectSeparation')
Group comparison example using pairwise_test_bin
:
## Group Comparison group_testing <- exampleData_ICS %>% filter(Population == 'IFNg Or IL2' & Group != 4) %>% group_by(Stim, Visit) %>% group_modify( ~ as.data.frame( pairwise_test_bin(x = .$response, group = .$Group, method = 'barnard', alternative = 'less', num_needed_for_test = 3, digits = 1, latex_output = TRUE, verbose = FALSE ) )) %>% ungroup() %>% # Getting fancy p values mutate( ResponseTest = pretty_pvalues( ResponseTest, output_type = 'latex', sig_alpha = .1, background = 'yellow' ) )
\clearpage
kableExtra::kable( group_testing, escape = FALSE, booktabs = TRUE, linesep = "", caption = 'Response Rate Comparisons Across Groups' ) %>% kableExtra::kable_styling( font_size = 6.5, latex_options = c("hold_position", "scale_down") ) %>% kableExtra::collapse_rows(c(1:2), row_group_label_position = 'identity', latex_hline = 'full')
\clearpage
Time point comparison example (paired) using pairwise_test_bin()
:
## Timepoint Comparison timepoint_testing <- exampleData_ICS %>% filter(Population == 'IFNg Or IL2' & Group != 4) %>% group_by(Stim, Group) %>% group_modify(~ as.data.frame( pairwise_test_bin(x = .$response, group = .$Visit, id = .$pubID, method = 'mcnemar', num_needed_for_test = 3, digits = 1, latex_output = TRUE, verbose = FALSE))) %>% ungroup() %>% # Getting fancy p values mutate(ResponseTest = pretty_pvalues(ResponseTest, output_type = 'latex', sig_alpha = .1, background = 'yellow')) kableExtra::kable(timepoint_testing, escape = FALSE, booktabs = TRUE, caption = 'Response Rate Comparisons Across Visits') %>% kableExtra::kable_styling(font_size = 6.5, latex_options = c("hold_position","scale_down")) %>% kableExtra::collapse_rows(c(1:2), row_group_label_position = 'identity', latex_hline = 'full')
\clearpage
Simple example using pairwise_test_cont()
:
set.seed(1) x_example <- c(NA, rnorm(50), rnorm(50, mean = 5)) group_example <- c(rep(1, 25), rep(2, 25), rep(3, 25), rep(4, 25), NA) pairwise_test_cont(x_example, group_example, digits = 1) %>% rename(Separation = 'PerfectSeparation')
\clearpage
Group comparison example using pairwise_test_cont()
:
## Group Comparison group_testing <- exampleData_ICS %>% filter(Population == 'IFNg Or IL2' & Group != 4) %>% group_by(Stim, Visit) %>% group_modify(~ as.data.frame( pairwise_test_cont(x = pmax(.$PercentCellNet, 0.00001), group = .$Group, method = 'wilcox', paired = FALSE, alternative = 'greater', num_needed_for_test = 3, digits = 3, log10_stats = TRUE))) %>% ungroup() %>% # Getting fancy p values mutate(MagnitudeTest = pretty_pvalues( MagnitudeTest, output_type = 'latex', sig_alpha = .1, background = 'yellow') ) %>% rename("Median (Range)" = Median_Min_Max, 'Mean (SD) (log10)' = log_Mean_SD) kableExtra::kable(group_testing, escape = FALSE, booktabs = TRUE, caption = 'Magnitude Comparisons Across Groups') %>% kableExtra::kable_styling(font_size = 6.5, latex_options = c("hold_position","scale_down")) %>% kableExtra::collapse_rows(c(1:2), row_group_label_position = 'identity', latex_hline = 'full')
\clearpage
Time point comparison example (paired) using pairwise_test_cont()
:
## Timepoint Comparison timepoint_testing <- exampleData_ICS %>% filter(Population == 'IFNg Or IL2' & Group != 4) %>% group_by(Stim, Group) %>% group_modify(~ as.data.frame( pairwise_test_cont(x = pmax(.$PercentCellNet, 0.00001), group = .$Visit, id = .$pubID, method = 'wilcox', paired = TRUE, num_needed_for_test = 3, digits = 3, log10_stats = TRUE))) %>% ungroup() %>% # Getting fancy p values mutate(MagnitudeTest = pretty_pvalues( MagnitudeTest, output_type = 'latex', sig_alpha = .1, background = 'yellow') ) %>% rename("Median (Range)" = Median_Min_Max, 'Mean (SD) (log10)' = log_Mean_SD) kableExtra::kable(timepoint_testing, escape = FALSE, booktabs = TRUE, caption = 'Magnitude Comparisons Across Visits') %>% kableExtra::kable_styling(font_size = 6.5, latex_options = c("hold_position","scale_down")) %>% kableExtra::collapse_rows(c(1:2), row_group_label_position = 'identity', latex_hline = 'full')
\clearpage
Simple example using cor_test_pairs
:
set.seed(1) x_example <- c(1, 1, rnorm(48), rnorm(49, mean = 5), NA) pair_example <- c(rep('antigen A', 25), rep('antigen B', 25), rep('antigen C', 25), rep('antigen D', 25)) id_example <- c(rep(1:25, 4)) cor_test_pairs(x_example, pair_example, id_example, digits = 4)
\clearpage
Correlation comparisons using cor_test_pairs
:
## Antigen Correlations antigen_testing <- exampleData_BAMA %>% filter(antigen %in% c('B.63521 D11gp120/293F', 'B.con.env03 140 CF', 'B.MN V3 gp70') & group != 4) %>% group_by(group, visitno) %>% group_modify(~ as.data.frame( cor_test_pairs(x = .x$magnitude, pair = .x$antigen, id = .x$pubID, method = 'spearman', n_distinct_value = 3))) %>% ungroup() %>% # Getting fancy p values mutate(CorrTest = pretty_pvalues(CorrTest, output_type = 'latex', sig_alpha = .1, background = 'yellow')) %>% rename("Corr P Value" = CorrTest) kableExtra::kable(antigen_testing, escape = FALSE, booktabs = TRUE, caption = 'Correlations between Antigen') %>% kableExtra::kable_styling(font_size = 6.5, latex_options = c("hold_position","scale_down")) %>% kableExtra::collapse_rows(c(1:2), row_group_label_position = 'identity', latex_hline = 'full')
\clearpage
create_step_curve()
creates survival probabilities from time and censoring
information and generates a risk table that includes the survival probabilities
and number at risk in addition to the data provided. This output can be used to
plot step line outcomes such as time-to-event (Kaplan-Meier curves), magnitude
breadth (MB) curves, and cumulative plots.
#Potency-breadth curves plot_data <- CAVD812_mAB %>% filter(virus != 'SVA-MLV') %>% tidyr::pivot_longer(cols = c(ic50, ic80)) %>% dplyr::group_by(name, product) %>% dplyr::group_modify(~ create_step_curve(x = pmin(.x$value, 100), event = as.numeric(.x$value < 50), flip_surv = TRUE, flip_top_x = 100)) ggplot2::ggplot(data = plot_data, ggplot2::aes(x = time, y = 1 - surv, color = product)) + ggplot2::geom_step(direction = 'hv', lwd = .35) + ggplot2::scale_x_log10() + ggplot2::scale_y_continuous('Viral Coverage (%)') + ggplot2::facet_grid(. ~ name) + ggplot2::theme_bw()
\clearpage
mb_results()
creates step curve info for magnitude breadth (MB) plots, with
options to include response status and have logged transformation for AUC-MB
calculation.
data_here <- exampleData_BAMA %>% filter(visitno == 2) group_results <- data_here %>% dplyr::group_by(group) %>% dplyr::group_modify(~ mb_results(magnitude = .x$magnitude , response = .x$response)) ind_results <- data_here %>% dplyr::group_by(group, pubID) %>% dplyr::group_modify(~ mb_results(magnitude = .x$magnitude , response = .x$response)) ggplot2::ggplot( data = group_results, ggplot2::aes(x = magnitude, y = breadth, color = factor(group))) + ggplot2::geom_step(data = ind_results, ggplot2::aes(group = pubID), linetype = "dashed", direction = 'hv', lwd = .35, alpha = .7) + ggplot2::geom_step(direction = 'hv', lwd = .65) + ggplot2::scale_x_log10('Response Magnitude', breaks = c(100,1000,10000, 22000), labels = c(expression("" <= 100),1000,10000, expression("" >= 22000))) + ggplot2::ylab('Magnitude Breadth (%)') + ggplot2::scale_color_discrete('Group') + ggplot2::coord_cartesian(xlim = c(95, 23000)) + ggplot2::theme_bw()
\clearpage
# AUC-MB plot AUC_MB <- dplyr::distinct(ind_results, group, pubID, aucMB) ggplot2::ggplot(AUC_MB, ggplot2::aes(x = factor(group), y = aucMB, color = factor(group))) + ggplot2::geom_boxplot(outlier.color = NA, show.legend = FALSE) + ggplot2::geom_point(position = ggplot2::position_jitter(width = 0.25, height = 0, seed = 1), size = 1.5, show.legend = FALSE) + ggplot2::scale_y_log10('AUC-MB') + ggplot2::xlab('Group') + ggplot2::theme_bw()
\clearpage
round_away_0()
is a function to properly perform mathematical rounding (i.e.
rounding away from 0 when tied), as opposed to the round()
function, which
rounds to the nearest even number when tied. Also round_away_0()
allows for
trailing zeros (i.e. 0.100 if rounding to 3 digits).
vals_to_round = c(NA,-3.5:3.5) vals_to_round round(vals_to_round) round_away_0(vals_to_round) round_away_0(vals_to_round, digits = 2, trailing_zeros = TRUE)
\clearpage
get_session_info()
produces reproducible tables, which are great to add to the
end of reports. The first table gives software session information and the
second table gives software package version information. get_full_name()
is a
function used by get_session_info()
to get the user's name, based on user's ID.
my_session_info <- get_session_info() kableExtra::kable( my_session_info$platform_table, booktabs = TRUE, linesep = '', caption = "Reproducibility Software Session Information" ) %>% kableExtra::kable_styling(font_size = 8, latex_options = "hold_position")
kableExtra::kable( my_session_info$packages_table, booktabs = TRUE, linesep = '', caption = "Reproducibility Software Package Version Information" ) %>% kableExtra::kable_styling(font_size = 8, latex_options = "hold_position")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.