Continuous Data"

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(descriptr)
library(dplyr)

Introduction

This document introduces you to a basic set of functions that describe data continuous data. The other two vignettes introduce you to functions that describe categorical data and visualization options.

Data

We have modified the mtcars data to create a new data set mtcarz. The only difference between the two data sets is related to the variable types.

str(mtcarz)

Data Screening

The ds_screener() function will screen a data set and return the following: - Column/Variable Names - Data Type - Levels (in case of categorical data) - Number of missing observations - % of missing observations

ds_screener(mtcarz)

Summary Statistics

The ds_summary_stats function returns a comprehensive set of statistics including measures of location, variation, symmetry and extreme observations.

ds_summary_stats(mtcarz, mpg)

You can pass multiple variables as shown below:

ds_summary_stats(mtcarz, mpg, disp)

If you do not specify any variables, it will detect all the continuous variables in the data set and return summary statistics for each of them.

Frequency Distribution

The ds_freq_table function creates frequency tables for continuous variables. The default number of intervals is 5.

ds_freq_table(mtcarz, mpg, 4)

Histogram

A plot() method has been defined which will generate a histogram.

k <- ds_freq_table(mtcarz, mpg, 4)
plot(k)

Auto Summary

If you want to view summary statistics and frequency tables of all or subset of variables in a data set, use ds_auto_summary().

ds_auto_summary_stats(mtcarz, disp, mpg)

Group Summary

The ds_group_summary() function returns descriptive statistics of a continuous variable for the different levels of a categorical variable.

k <- ds_group_summary(mtcarz, cyl, mpg)
k

ds_group_summary() returns a tibble which can be used for further analysis.

k$tidy_stats

Box Plot

A plot() method has been defined for comparing distributions.

k <- ds_group_summary(mtcarz, cyl, mpg)
plot(k)

Multiple Variables

If you want grouped summary statistics for multiple variables in a data set, use ds_auto_group_summary().

ds_auto_group_summary(mtcarz, cyl, gear, mpg)

Combination of Categories

To look at the descriptive statistics of a continuous variable for different combinations of levels of two or more categorical variables, use ds_group_summary_interact().

ds_group_summary_interact(mtcarz, mpg, cyl, gear)

Multiple Variable Statistics

The ds_tidy_stats() function returns summary/descriptive statistics for variables in a data frame/tibble.

ds_tidy_stats(mtcarz, mpg, disp, hp)

Measures

If you want to view the measure of location, variation, symmetry, percentiles and extreme observations as tibbles, use the below functions. All of them, except for ds_extreme_obs() will work with single or multiple variables. If you do not specify the variables, they will return the results for all the continuous variables in the data set.

Measures of Location

ds_measures_location(mtcarz)

Measures of Variation

ds_measures_variation(mtcarz)

Measures of Symmetry

ds_measures_symmetry(mtcarz)

Percentiles

ds_percentiles(mtcarz)

Extreme Observations

ds_extreme_obs(mtcarz, mpg)


Try the descriptr package in your browser

Any scripts or data that you put into this service are public.

descriptr documentation built on Dec. 15, 2020, 5:37 p.m.