In AntonioJBT/episcout: Quickly Clean, Explore and Visualise Large Epidemiological Datasets

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Welcome to episcout!

episcout has many functions that can be used to quickly explore data sets. It is particularly useful during cleaning, describing and visualising data sets of tens of thousands of rows with tens to hundreds of columns.

It was developed using a combination of tidyverse packages, base R functions and data.table.

It suggests many packages but does not force you to import them as there are many functions available.

If you want to install all of them however, do:

# Remove 'eval = FALSE' to run
install.packages(c(
  "dplyr",
  "tibble",
  "tidyr",
  "data.table",
  "compare",
  "stringi",
  "stringr",
  "lubridate",
  "purrr",
  "e1071",
  "Hmisc",
  "ggplot2",
  "cowplot",
  "scales",
  "ggthemes",
  "future",
  "doFuture",
  "foreach",
  "iterators",
  "magrittr",
  "reshape2"
))

Many parameters are given as defaults, admittedly with quite a bit of personal preference, but with the aim of processing hundreds of thousands of observations from hundreds of variables from multiple data sets a bit faster. Hence, convenience and standardisation are often preferred. This may not be your case however but it is simple enough to defer to your preferred R packages.

Below are a number of examples of how to use episcout functions.

All functions start with "epi_".

library(episcout)

Currently there are functions for: - pre-processing: epi_clean_* - descriptive statistics: epi_stats_* - visualising: epi_plot_* - various: epi_read() ; epi_write() etc.

A few examples with dummy data

# Test set df:
n <- 20
df <- data.frame(
  var_id = rep(1:(n / 2), each = 2),
  var_to_rep = rep(c("Pre", "Post"), n / 2),
  x = rnorm(n),
  y = rbinom(n, 1, 0.50),
  z = rpois(n, 2)
)
df

Checking duplicates

epi_clean_get_dups() identifies duplicate rows based on a chosen column.

epi_clean_get_dups(df, "id", 1)
epi_clean_get_dups(df, "id", 1)

Numeric summaries

epi_stats_numeric() calculates descriptive statistics for a numeric vector.

summary_stats <- epi_stats_numeric(df$x)
summary_stats

Plotting

epi_plot_hist() quickly draws a histogram of a numeric column.

epi_plot_hist(df, "x")

This vignette only scratches the surface of what episcout can do. For more details see the function documentation.

AntonioJBT/episcout documentation built on June 11, 2025, 7:26 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

AntonioJBT/episcout
Quickly Clean, Explore and Visualise Large Epidemiological Datasets

In AntonioJBT/episcout: Quickly Clean, Explore and Visualise Large Epidemiological Datasets

A few examples with dummy data

Checking duplicates

Numeric summaries

Plotting

R Package Documentation

Browse R Packages

We want your feedback!

AntonioJBT/episcout Quickly Clean, Explore and Visualise Large Epidemiological Datasets

In AntonioJBT/episcout: Quickly Clean, Explore and Visualise Large Epidemiological Datasets

A few examples with dummy data

Checking duplicates

Numeric summaries

Plotting

R Package Documentation

Browse R Packages

We want your feedback!

AntonioJBT/episcout
Quickly Clean, Explore and Visualise Large Epidemiological Datasets