describe_df: Describe and visulaize the statistic distribution of a...

Description Usage Arguments Details Value Examples

Description

A family of functions to analyze a data.frame and output the descriptive analysis results and/or visualization.

Usage

1
2
3
4
5
6
describe_df(data, qual_vars, quan_vars, nrow = 1, ncol = 1, ...)

cal_df_distrib(data, qual_vars, quan_vars, ...)

vis_df_distrib(data, qual_vars, quan_vars, nrow = NULL, ncol = NULL,
  ...)

Arguments

data

a data.frame

qual_vars

variable(s) indicating the qualitative variable(s). It accepts five forms:

  • missing (default), the function will automatically identify the qualitative variable

  • character vector indicating variable names, e.g., c("cyl", "gear")

  • integer vector indicating variable index, e.g., c(2, 10)

  • quosures yielded using vars(), e.g, vars(cyl, gear)

  • NULL indicating that the function will not process qualitatitive variables

quan_vars

variable(s) indicating the quantitative variable(s). It accepts five forms:

  • missing (default), the function will automatically identify the quantitative variables

  • character vector indicating variable names, e.g., c("drat", "mpg")

  • integer vector indicating variable index, e.g., c(1, 5)

  • quosures yielded using vars(), e.g, vars(drat, mpg)

  • NULL indicating that the function will not process quantitative variables

nrow

NULL or positive integer. Only applicable for vis_df_distrib, determining the number of rows in the plot grid. See facet_wrap for more details.

ncol

NULL or positive integer. Only applicable for vis_df_distrib, determining the number of columns in the plot grid. See facet_wrap for more details.

...

other arguments to pass to describe_df, cal_df_distrib and vis_df_distrib. Accepts the following:

  • pass to cal_df_distrib

    argument for choose_col_idx

    diversity_threshold

  • pass to vis_df_distrib

    help_on_dots

    logical. If TRUE, you can get help info for ... argument.

    elipsis arguments for geom_histogram

    hist.title, hist.tag, hist.subtitle, hist.stat, hist.position, hist.bins, hist.color, hist.inherit.aes

    elipsis arguments for histograms and geom_bar

    bar.stat, bar.position, bar.width, bar.binwidth, bar.fill, bar.color, bar.size, bar.alpha, bar.na.rm, bar.show.legend, bar.inheirt.aes

    elipsis arguments for geom_vline

    vline.color, vline.size, vline.alpha, vline.show.legend, vline.na.rm

Details

describe_df is a wrapper of cal_df_distrib and vis_df_distrib.

Value

describe_df(): a list of results by cal_df_distrib and vis_df_distrib

cal_df_distrib(): two lists of tibbles

'qual'

a tibble with 4 columns: <varname>, 'value', 'freq', 'prop'; or a character "no match columns"

'quan'

a tibble with 11 columns: <varname>, 'count', 'n_na', 'p_na', 'mean', 'sd', 'min', 'lower', 'median', 'higher', 'max'; or a character "no match columns"

vis_df_distrib(): two sets of plots

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
## Not run: 
describe_df(mtcars, vars(am, cyl, mpg, wt), NULL)

# you can also detect the distribution of the data in clipboard
describe_df(parse_clipb())

## End(Not run)
## Not run: 
cal_df_distrib(iris)  ## or
library(dplyr)
cal_df_distrib(iris, quan_vars=vars(Sepal.Length, Petal.Length))

## End(Not run)
## Not run: 
library(dplyr)
vis_df_distrib(mtcars, qual=vars(am, cyl), quan=vars(mpg, wt))

## End(Not run)

madlogos/aseskit documentation built on June 26, 2019, 12:17 a.m.