README.md

cursory

Travis build
status Codecov test
coverage Lifecycle:
maturing

The goal of cursory is to make it easier to summarize data and look at your variables. It builds off dplyr and purrr. It is also compatible with dbplyr and remote data.

Installation

You can install the released version of cursory from CRAN with:

install.packages("cursory")

And the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("halpo/cursory")

Example

This is a basic example which shows you how to solve a common problem:

library(dplyr)
library(cursory)
data(iris)

## basic summary statistics for each variable in a data frame.
cursory_all(group_by(iris, Species), lst(mean, median)) %>% ungroup() 

| Variable | Species | mean | median | | :----------- | :--------- | ----: | -----: | | Sepal.Length | setosa | 5.006 | 5.00 | | Sepal.Length | versicolor | 5.936 | 5.90 | | Sepal.Length | virginica | 6.588 | 6.50 | | Sepal.Width | setosa | 3.428 | 3.40 | | Sepal.Width | versicolor | 2.770 | 2.80 | | Sepal.Width | virginica | 2.974 | 3.00 | | Petal.Length | setosa | 1.462 | 1.50 | | Petal.Length | versicolor | 4.260 | 4.35 | | Petal.Length | virginica | 5.552 | 5.55 | | Petal.Width | setosa | 0.246 | 0.20 | | Petal.Width | versicolor | 1.326 | 1.30 | | Petal.Width | virginica | 2.026 | 2.00 |


## summary statistics for only numeric variables. 
cursory_if(iris, is.numeric, lst(Mean = mean, 'Std. Deviation' = sd))

| Variable | Mean | Std. Deviation | | :----------- | -------: | -------------: | | Sepal.Length | 5.843333 | 0.8280661 | | Sepal.Width | 3.057333 | 0.4358663 | | Petal.Length | 3.758000 | 1.7652982 | | Petal.Width | 1.199333 | 0.7622377 |


## summary statistics for specific variables. 
cursory_at(iris, vars(ends_with("Length")), var)

| Variable | var | | :----------- | --------: | | Sepal.Length | 0.6856935 | | Petal.Length | 3.1162779 |

table_1

The cursory package also provides a table_1 function that allows for describing variables of a dataset for different subsets automatically. This is useful in creating the very common demographics “table 1”.

table_1(iris, Species)

| Variable | Level | (All) | setosa | versicolor | virginica | | :----------- | :----- | :---- | :----- | :--------- | :-------- | | Sepal.Length | Min | 4.300 | 4.300 | 4.900 | 4.900 | | | Median | 5.800 | 5.000 | 5.900 | 6.500 | | | Mean | 5.843 | 5.006 | 5.936 | 6.588 | | | Max | 7.900 | 5.800 | 7.000 | 7.900 | | | SD | 0.828 | 0.352 | 0.516 | 0.636 | | Sepal.Width | Min | 2.000 | 2.300 | 2.000 | 2.200 | | | Median | 3.000 | 3.400 | 2.800 | 3.000 | | | Mean | 3.057 | 3.428 | 2.770 | 2.974 | | | Max | 4.400 | 4.400 | 3.400 | 3.800 | | | SD | 0.436 | 0.379 | 0.314 | 0.322 | | Petal.Length | Min | 1.000 | 1.000 | 3.000 | 4.500 | | | Median | 4.300 | 1.500 | 4.300 | 5.500 | | | Mean | 3.758 | 1.462 | 4.260 | 5.552 | | | Max | 6.900 | 1.900 | 5.100 | 6.900 | | | SD | 1.765 | 0.174 | 0.470 | 0.552 | | Petal.Width | Min | 0.100 | 0.100 | 1.000 | 1.400 | | | Median | 1.300 | 0.200 | 1.300 | 2.000 | | | Mean | 1.199 | 0.246 | 1.326 | 2.026 | | | Max | 2.500 | 0.600 | 1.800 | 2.500 | | | SD | 0.762 | 0.105 | 0.198 | 0.275 |

The table_1() function also tags the Variable column as a dontrepeat class column which make repeating values in columns not appear when formatted, so that tables are easier to read.



Try the cursory package in your browser

Any scripts or data that you put into this service are public.

cursory documentation built on Aug. 22, 2019, 9:03 a.m.