knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(descriptr) library(dplyr)
This document introduces you to a basic set of functions that describe data continuous data. The other two vignettes introduce you to functions that describe categorical data and visualization options.
We have modified the mtcars
data to create a new data set mtcarz
. The only
difference between the two data sets is related to the variable types.
str(mtcarz)
The ds_screener()
function will screen a data set and return the following:
- Column/Variable Names
- Data Type
- Levels (in case of categorical data)
- Number of missing observations
- % of missing observations
ds_screener(mtcarz)
The ds_summary_stats
function returns a comprehensive set of statistics
including measures of location, variation, symmetry and extreme observations.
ds_summary_stats(mtcarz, mpg)
You can pass multiple variables as shown below:
ds_summary_stats(mtcarz, mpg, disp)
If you do not specify any variables, it will detect all the continuous variables in the data set and return summary statistics for each of them.
The ds_freq_table
function creates frequency tables for continuous variables.
The default number of intervals is 5.
ds_freq_table(mtcarz, mpg, 4)
A plot()
method has been defined which will generate a histogram.
k <- ds_freq_table(mtcarz, mpg, 4) plot(k)
If you want to view summary statistics and frequency tables of all or subset of
variables in a data set, use ds_auto_summary()
.
ds_auto_summary_stats(mtcarz, disp, mpg)
The ds_group_summary()
function returns descriptive statistics of a continuous
variable for the different levels of a categorical variable.
k <- ds_group_summary(mtcarz, cyl, mpg) k
ds_group_summary()
returns a tibble which can be used for further analysis.
k$tidy_stats
A plot()
method has been defined for comparing distributions.
k <- ds_group_summary(mtcarz, cyl, mpg) plot(k)
If you want grouped summary statistics for multiple variables in a data set, use
ds_auto_group_summary()
.
ds_auto_group_summary(mtcarz, cyl, gear, mpg)
To look at the descriptive statistics of a continuous variable for different
combinations of levels of two or more categorical variables, use
ds_group_summary_interact()
.
ds_group_summary_interact(mtcarz, mpg, cyl, gear)
The ds_tidy_stats()
function returns summary/descriptive statistics for
variables in a data frame/tibble.
ds_tidy_stats(mtcarz, mpg, disp, hp)
If you want to view the measure of location, variation, symmetry, percentiles
and extreme observations as tibbles, use the below functions. All of them,
except for ds_extreme_obs()
will work with single or multiple variables. If
you do not specify the variables, they will return the results for all the
continuous variables in the data set.
ds_measures_location(mtcarz)
ds_measures_variation(mtcarz)
ds_measures_symmetry(mtcarz)
ds_percentiles(mtcarz)
ds_extreme_obs(mtcarz, mpg)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.