knitr::opts_chunk$set(
  collapse = TRUE,
  cache = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  fig.align = "center",
  out.width = "100%"
)

elucidate

Project Status

img R-CMD-check

This package is currently maintained by Craig Hutton, a Data Scientist working with the Research Branch of the British Columbia Ministry of Social Development & Poverty Reduction.

Why elucidate?

elucidate provides a collection of convenience functions to make exploratory data analysis in R easier and more accessible for researchers to (among other things):

Inspired by tidyverse naming conventions, the core functions of elucidate are organized into sets that begin with a common root (e.g. describe*, plot_*), since this enables the user to see them all as suggestions as you are coding in R studio.

Drawing from similar inspiration, many elucidate functions are also designed to accept a data object as the 1st argument and return a data or plotting object (e.g. ggplot2 or plotly) so they are compatible with the pipe operator from the magrittr package for easy integration into data processing pipelines.

For a comprehensive introduction to the package see the vignette via vignette("elucidate").

Installation

You can install the development version of elucidate from this repository with:

# use the remotes package to install from a github repository

install.packages("remotes") #only run this 1st if you haven't installed remotes before

remotes::install_github("bcgov/elucidate")

The authors of elucidate acknowledge and express their gratitude to the authors of the tidyverse packages, data.table, and the functions of other dependency packages which were used to build elucidate, since without their effort and ingenuity elucidate would mostly have remained a collection of ideas instead of functions.

Usage

dupes() can tell you how many rows are duplicated based on one or more variables (default is all of them).

library(elucidate)

#list any number of variables to use when searching for duplicates after the
#data argument
dupes(pdata, d) 
#in this case we search for duplicated based on the "d" (date) column        

describe() a single variable in a data frame or a vector of values.

#set random generator seed for reproducibility
set.seed(1234)

#using a numeric vector as input
describe(data = rnorm(1:1000, 100, 5))

describe_all() all variables in a data frame.

describe_all(pdata)

Use plot_var() to produce a class-appropriate ggplot2 graph of a single variable in a data frame or a vector of values.

plot_var(data = rnorm(1:1000, 100, 5)) 
#in this case we get a density plot with a normal density curve added for
#reference (dashed line).

To generate class-appropriate ggplot2 graphs for all variables in a data frame and combine them into a multiple-panel figure with the patchwork package, use plot_var_all(). You can also limit the graphing to a subset of columns with the "cols" argument, which accepts a character vector of column names.

plot_var_all(pdata, cols = c("y1", "y2", "g", "even"))
#density plots for numeric variables and bar graphs for categorical variables

Learn more

These examples only highlight a few of the many things elucidate can do. You can learn more from these additional resources:

Reporting an Issue

To report bugs/issues or request feature changes, open an issue for the package GitHub repo. If raising an issue, please provide a reproducible example (reprex) of the problem you're encountering.

Requesting Features and/or Changes

To suggest changes or code improvements, please submit a pull request.

License

Copyright 2021 Province of British Columbia

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at:

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.



bcgov/elucidate documentation built on Sept. 3, 2022, 7:16 p.m.