Nothing
#' @title Explore Your Data (ExPanD)
#'
#' @description A shiny based web app that uses ExPanDaR functionality for
#' interactive data exploration. Designed for long-form panel data but works on
#' simple cross-sectional data as well.
#'
#' @param df A data frame or a list of data frames containing the data that
#' you want to explore. If NULL, ExPanD will start up with a file upload
#' dialog.
#' @param cs_id A character vector containing the names of the variables that
#' identify the cross-section in your data. If only \code{cs_id} and not
#' \code{ts_id} is provided, the data is treated as cross-sectional, and only
#' appropriate displays are included. \code{df_def} overrides if provided.
#' @param ts_id A character scalar identifying the name of
#' the variable that identifies the time series in your data. The according
#' variable needs to be coercible to an ordered vector.
#' If you provide a time series indicator that already is an ordered vector,
#' ExPanD will verify that it has the same levels for each data frame
#' and throw an error otherwise. If \code{cs_id} and \code{ts_id} are not
#' provided either directly of by \code{df_def}, the data is treated as
#' cross-sectional,observations are identified by row names and
#' only appropriate displays are included. \code{df_def} overrides if
#' provided.
#' @param df_def An optional dataframe (or a list of dataframes) containing
#' variable names, definitions and types. If NULL (the default) ExPanD
#' uses \code{cs_id} and \code{ts_id} to identify the data structure and
#' determines the variable types (factor, numeric, logical) based on the
#' classes of the data. See the details section for further information.
#' @param var_def If you specify here a dataframe containing variable names and
#' variable definitions, ExPanD will use these on the provided sample(s) to
#' create the analysis sample. See the details section
#' for the structure of the \code{var_def} dataframe. If NULL (default)
#' the sample(s) provided by \code{df} will be used as analysis sample(s)
#' directly.
#' @param config_list a list containing the startup configuration for ExPanD to
#' display. Take a look at \code{data(ExPanD_config_russell_3000)} for the
#' format. The easiest way to generate a config list is to customize the
#' display within the app and then save the configuration locally.
#' @param title the title to display in the shiny web app.
#' @param abstract An introductory text to display in the shiny web app. Needs
#' to be formatted as clean HTML.
#' @param df_name A character string or a vector of character strings
#' characterizing the dataframe(s) provided in \code{df} (will be used in the
#' selection menu of the app)
#' @param long_def If you set this to TRUE (default) and are providing a
#' \code{var_def} then ExPanD will add the definitions of the used variables
#' of the underlying dataframe to the definitions provided for the analysis
#' sample to make these more informative to the user.
#' If set to FALSE only the variable definitions provided in the
#' \code{var_def} sample will be provided to the user.
#' @param factor_cutoff ExPanD treats factors different from numerical variables.
#' Factors are available for sub-sampling data and for certain plots.
#' Each variable classified as such will be treated as a factor. In addition,
#' ExPanD classifies all logical values and all numerical values with less or
#' equal than \code{factor_cutoff} unique values as a factor.
#' @param components A named logical vector indicating the components that you want
#' ExPanD to generate and their order. See the function head of \code{ExpanD}
#' for the list of available components. By default, all components are reported.
#' You can also exclude selected components from the standard order by setting
#' then to \code{FALSE}. In addition, you can include an arbitrary number of
#' \code{html_block} components. Each block will render clean HTML code as contained in
#' the \code{html_blocks} parameter below. This allows you to customize your
#' ExPanD report.
#' @param html_blocks A character vector containing the clean HTML code for each
#' \code{html_block} that is included in \code{components}.
#' @param export_nb_option Do you want to give the user the option to download your
#' data and an R notebook containing code for the analyses that \code{ExPanD} displays?
#' Defaults to \code{FALSE}.
#' @param store_encrypted Do you want the user-side saved config files to be
#' encrypted? A security measure to avoid that users can inject arbitrary code
#' in the config list. Probably a good idea when you are hosting sensitive data
#' on a publicly available server.
#' @param key_phrase The key phrase to use for encryption.
#' Change this from the default if you want to encrypt the config files.
#' @param debug Do you want ExPanD to echo some debug timing information to the
#' console/log file and to store some diagnostics to the global environment?
#' Probably not.
#' @param ... Additional parameters that are passed on to
#' \code{\link[shiny]{runApp}}.
#'
#' @details
#'
#' If you start ExPanD without any options, it will start with an upload
#' dialog so that the user (e.g., you) can upload a data file
#' for analysis. Supported formats are as provided
#' by the \code{rio} package.
#'
#' When you start ExPanD with a dataframe as the only parameter, it will assume
#' the data to be cross-sectional and will use its row names as the
#' cross-sectional identifier.
#'
#' When you have panel data in long format, set the \code{ts_id} and
#' \code{cs_id} parameters to identify the variables that determine
#' the time series and cross-sectional dimensions.
#'
#' If you provide variable definitions in \code{df_def} and/or \code{var_def},
#' ExPanD displays these as tooltips in the descriptive table of the
#' ExPanD app. In this case, you need to identify the panel dimensions in the
#' variable definitions (see below).
#'
#' When you provide more than one data frame in \code{df}, make sure that all have
#' the same variables and variable types defined. If not, ExPanD will throw
#' an error. When you provide only one \code{df_def} for multiple data frames,
#' \code{df_def} will be recycled.
#'
#' When you provide \code{var_def}, ExPanD starts up in the "advanced mode". The
#' advanced mode uses (a) base sample(s) (the one(s) you provide via \code{df})
#' and the variable definitions in \code{var_def} to generate an analysis
#' sample based on the active base sample. In the advanced mode, the app user
#' can delete variables from the analysis sample within the app.
#'
#' A \code{df_def} or \code{var_def} dataframe can contain the following
#' variables
#' \describe{
#' \item{"var_name"}{Required: The names of the variables
#' that are provided by the base sample
#' or are to be calculated for the analysis sample}
#' \item{"var_def"}{Required: For a \code{var_def} data frame,
#' the code that is passed to the data frame
#' (grouped by cross-sectional units) in calls to
#' \code{\link[dplyr]{mutate}} as right hand side
#' to calculate the respective variable.
#' For a \code{data_def} data frame, a string
#' describing the nature of the variable.}
#' \item{"type"}{Required: One of the strings "cs_id", "ts_id", "factor",
#' "logical" or "numeric", indicating the type of the variable. Please note that
#' at least one variable has to be assigned as a cross-sectional identifier
#' ("cs_id") and exactly one variable that is coercible into an ordered factor
#' has to be assigned as the time-series identifier ("ts_id").}
#' \item{"can_be_na"}{Optional: If included,
#' then all variables with this value set to FALSE are required to be non
#' missing in the data set. This reduces the number of observations. If missing,
#' it defaults to being TRUE for all variables other than cs_id and ts_id.} }
#'
#' @examples
#' \dontrun{
#' ExPanD()
#'
#' # Use this if you want to read very large files via the file dialog
#' options(shiny.maxRequestSize = 1024^3)
#' ExPanD()
#'
#' # Explore cross-sectional data
#' ExPanD(mtcars)
#'
#' # Include the option to download notebook code and data
#' ExPanD(mtcars, export_nb_option = TRUE)
#'
#' # Use ExPanD on long-form panel data
#' data(russell_3000)
#' ExPanD(russell_3000, c("coid", "coname"), "period")
#' ExPanD(russell_3000, df_def = russell_3000_data_def)
#' ExPanD(russell_3000, df_def = russell_3000_data_def,
#' components = c(ext_obs = T, descriptive_table = T, regression = T))
#' ExPanD(russell_3000, df_def = russell_3000_data_def,
#' components = c(missing_values = F, by_group_violin_graph = F))
#' ExPanD(russell_3000, df_def = russell_3000_data_def,
#' components = c(html_block = T, descriptive_table = T,
#' html_block = T, regression = T),
#' html_blocks = c(
#' paste('<div class="col-sm-2"><h3>HTML Block 1</h3></div>',
#' '<div class="col-sm-10">',
#' "<p></p>This is a condensed variant of ExPanD with two additional HTML Blocks.",
#' "</div>"),
#' paste('<div class="col-sm-2"><h3>HTML Block 2</h3></div>',
#' '<div class="col-sm-10">',
#' "It contains only the descriptive table and the regression component.",
#' "</div>")))
#' data(ExPanD_config_russell_3000)
#' ExPanD(df = russell_3000, df_def = russell_3000_data_def,
#' config_list = ExPanD_config_russell_3000)
#' exploratory_sample <- sample(nrow(russell_3000), round(0.5*nrow(russell_3000)))
#' test_sample <- setdiff(1:nrow(russell_3000), exploratory_sample)
#' ExPanD(df = list(russell_3000[exploratory_sample, ], russell_3000[test_sample, ]),
#' df_def = russell_3000_data_def,
#' df_name = c("Exploratory sample", "Test sample"))
#' ExPanD(worldbank, df_def = worldbank_data_def, var_def = worldbank_var_def,
#' config_list = ExPanD_config_worldbank)
#'
#' }
#' @export
ExPanD <- function(df = NULL, cs_id = NULL, ts_id = NULL,
df_def = NULL, var_def = NULL, config_list = NULL,
title = "ExPanD - Explore your data!",
abstract = NULL,
df_name = deparse(substitute(df)),
long_def = TRUE,
factor_cutoff = 10L,
components = c(sample_selection = TRUE,
subset_factor = TRUE,
grouping = TRUE,
bar_chart = TRUE,
missing_values = TRUE,
udvars = TRUE,
descriptive_table = TRUE,
histogram = TRUE,
ext_obs = TRUE,
by_group_bar_graph = TRUE,
by_group_violin_graph = TRUE,
trend_graph = TRUE,
quantile_trend_graph = TRUE,
by_group_trend_graph = TRUE,
corrplot = TRUE,
scatter_plot = TRUE,
regression = TRUE),
html_blocks = NULL,
export_nb_option = FALSE,
store_encrypted = FALSE,
key_phrase = "What a wonderful key",
debug = FALSE, ...) {
orig_df_name <- deparse(substitute(df))
shiny_df_name <- df_name
if (!is.null(df) && !is.data.frame(df) && !is.list(df))
stop("df is neither a dataframe nor a list of dataframes")
if (!is.null(df) && !is.data.frame(df) &&
length(which(!sapply(df, is.data.frame))) > 0)
stop("df is a list containing non-dataframe members")
if (!is.null(df_def) && !is.data.frame(df_def) && !is.list(df_def))
stop("df_def is neither a dataframe nor a list of dataframes")
if (!is.null(df_def) && !is.data.frame(df_def) &&
length(which(!sapply(df_def, is.data.frame))) > 0)
stop("df_def is a list containing non-dataframe members")
if (!is.data.frame(df) && !is.null(df_def) && is.data.frame(df_def))
df_def <- rep(list(df_def), length(df))
if (length(factor_cutoff) != 1 && !is.integer(factor_cutoff))
stop("factor_cutoff needs to be an integer scalar.")
shiny_cs_data <- !is.null(df) && is.null(ts_id) && is.null(df_def)
if (!is.null(df_def)) {
if (is.data.frame(df_def)) shiny_cs_data <- ! "ts_id" %in% df_def$type
else shiny_cs_data <- ! "ts_id" %in% df_def[[1]][, "type"]
}
if (shiny_cs_data) {
ts_id <- "ts_id"
if (is.null(cs_id)) {
create_cs_id <- TRUE
cs_id <- "cs_id"
} else create_cs_id <- FALSE
}
if(!is.null(df_def)) cs_id <- NULL
if(!is.null(df)) {
if(!is.data.frame(df)) {
if (shiny_cs_data) {
if (create_cs_id) lapply(df, function(x) x[, "cs_id"] <- row.names(x))
lapply(df, function(x) x[, "ts_id"] <- 1)
if(!is.null(df_def)) {
if (create_cs_id) {
df_def <- lapply(
df_def,
function(x) rbind(
x,
list("cs_id", "Cross-sectional indicator", "cs_id", FALSE)
)
)
}
df_def <- lapply(
df_def,
function(x) rbind(
x,
list("ts_id", "Pseudo time series indicator", "ts_id", FALSE)
)
)
}
}
names_df <- lapply(df, names)
if (!is.null(df_def)) {
for(i in 1: length(names_df)) {
df_def[[i]][1:3] <- lapply(df_def[[i]][1:3], as.character)
if(!identical(names_df[[i]], df_def[[i]]$var_name))
stop ("Provided data definitions have different variable names than data frames")
}
ts_id <- as.character(df_def[[1]][df_def[[1]][, 3] == "ts_id", 1])
cs_id <- as.character(df_def[[1]][df_def[[1]][, 3] == "cs_id", 1])
}
if (! ts_id %in% names_df[[1]]) stop ("Time series identifier not included in data frames.")
if (! all(cs_id %in% names_df[[1]])) stop ("Not all cross sectional identifier(s) are included in data frames.")
for (i in 2:length(names_df)) {
if(!identical(names_df[[1]], names_df[[i]])) stop ("Provided data frames do not have identical variable names")
if(is.ordered(df[[1]][, ts_id]) &
!identical(levels(df[[1]][, ts_id]), levels(df[[i]][, ts_id]))) {
stop("Provided data frames' time series identifiers have different levels")
}
}
} else {
if (shiny_cs_data) {
if (create_cs_id) df$cs_id <- row.names(df)
df$ts_id <- 1
if (!is.null(df_def)) {
if (create_cs_id) {
df_def <- rbind(
df_def,
list("cs_id", "Cross-sectional indicator", "cs_id", FALSE)
)
}
df_def <- rbind(
df_def,
list("ts_id", "Pseudo time series indicator", "ts_id", FALSE)
)
}
}
if (!is.null(df_def)) {
df_def[1:3] <- lapply(df_def[1:3], as.character)
if(!identical(names(df), df_def$var_name)) stop ("Provided data definitions have different variable names than data frame")
} else {
if (! ts_id %in% names(df)) stop ("Time series identifier not included in data frame.")
if (! all(ts_id %in% names(df))) stop ("Cross sectional identifier(s) not all included in data frame.")
}
}
tdf <- df
tdf_def <- df_def
if(!is.data.frame(df)) {
tdf <- df[[1]]
tdf_def <- df_def[[1]]
}
if (!is.null(tdf_def)) {
if(length(which(tdf_def$type == "numeric")) < 2) stop ("Less than two numerical variables contained in data frame. At least two are required.")
} else {
sdf <- tdf[, -which(names(tdf) %in% c(cs_id, ts_id))]
num_numeric <- length(which(sapply(sdf, is.numeric)))
if(num_numeric < 2) stop ("Less than two numerical variables contained in data frame. At least two are required.")
}
}
comp_names <- c("sample_selection", "subset_factor", "grouping",
"bar_chart", "missing_values", "udvars", "descriptive_table",
"histogram", "ext_obs", "by_group_bar_graph",
"by_group_violin_graph", "trend_graph",
"quantile_trend_graph", "by_group_trend_graph", "corrplot",
"scatter_plot", "regression")
if (!is.vector(components) | !is.logical(components)) stop("components needs to be a vector of logical values")
# The following legacy code is for the old calling style in Version 0.2.0
# using unnamed vectors
if (is.null(names(components)) & length(components) == 12) {
components <- c(T, T, T, components[1:2], T, components[3:9], T, components[10:12])
names(components) <- comp_names
}
if (is.null(names(components))) stop(sprintf("Component vector has missing names and is not of valid length %d", length(comp_names)))
if (!all(names(components) %in% c(comp_names, "html_block"))) stop(paste("Component vector has invalid names. Valid names are:", paste(c(comp_names, "html_block"), collapse = ", ")))
if (all(components == FALSE)) {
rem_components <- comp_names[!(comp_names %in% names(components))]
components <- rep(TRUE, length(rem_components))
names(components) <- rem_components
}
if (any("html_block" %in% names(components))) {
if (sum(names(components) == "html_block") != length(html_blocks))
stop("Number of html_blocks texts provided does not match number of html_block tags in components")
} else if (!is.null(html_blocks)) stop("html_blocks provided but no html_block tag found in components")
if(shiny_cs_data) {
components <- components[!names(components) %in%
c("missing_values", "trend_graph",
"quantile_trend_graph", "by_group_trend_graph")]
}
if (!is.logical(export_nb_option))
stop("export_nb_option needs to be a logical value")
if(!is.null(var_def)) var_def[1:3] <- lapply(var_def[1:3], as.character)
shiny_df <- df
shiny_cs_id <- cs_id
shiny_ts_id <- ts_id
shiny_df_def <- df_def
shiny_var_def <- var_def
shiny_config_list <- config_list
shiny_title <- title
shiny_abstract <- abstract
if (is.data.frame(df)) {
shiny_df_id <- orig_df_name
} else {
shiny_df_id <- names(df)
if (is.null(shiny_df_id)) shiny_df_id <- paste0("df list member ", 1:length(df))
}
if (!is.data.frame(df) && length(df_name) == 1) shiny_df_name <- paste(df_name, 1:length(df))
shiny_long_def <- long_def
shiny_factor_cutoff <- factor_cutoff
shiny_key_phrase <- key_phrase
shiny_store_encrypted <- store_encrypted
shiny_debug <- debug
shiny_components <- components[components]
shiny_html_blocks <- html_blocks
shiny_export_nb_option <- export_nb_option
pkg_app_dir <- system.file("application", package = "ExPanDaR")
file.copy(pkg_app_dir, tempdir(), recursive=TRUE)
app_dir <- paste0(tempdir(), "/application")
save(list = ls(pattern = "shiny"), file = paste0(app_dir, "/shiny_data.Rda"))
on.exit(unlink(app_dir, recursive = TRUE))
try(shiny::runApp(appDir = app_dir, ...))
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.