knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
Reimplementation of the infer
R package, that offers a tidy way of developing statistical inference built on top of Tidyverse.
The infer package streamlines the process of reshuffling and bootstrapping of samples, calculating summary statistics and confidence intervals, and performing hypothesis tests for statistical inference. It does this using a combination of functions that are built with the emphasis on clear expressive code and using correct statistical grammar that explains the way the values are calculated and the tests are evaluated in statistical inference.
With this package as the inspiration, rfer will have four main functions (specify,generate,calculate,get_ci) for the first iteration. These functions will, given a data frame and the specified response variable; calculate summary statistics and confidence intervals for the response variable. Further details follow in the description of the functions below.
In order to show an example of how the Rfer package works, we'll use an example using the old-faithful iris dataset. Boring, we know, but will get you up to speed with this package easier than other datasets. And it's fairly straightforward to interpret.
library(rfer) library(dplyr) set.seed(41) iris_df <- iris %>% mutate(Species = factor(Species)) #Rough method to get a value of the point estimate hp_point <- mtcars %>% specify(response = "hp") %>% generate(n_samples = 1) %>% calculate(column = "hp",stat="mean") hp_point_estimate <- hp_point[[2]]
In the specify function, the objective is to create a dataframe that will be used in the remainder of the pipeline that contains the response variable that is looking to be studied, along with optionally some explanatory variables.
Sep_Width <- iris_df %>% specify(response="Sepal.Width")
Sep_Width
The objective of the generate function is to generate and create n samples (equivalent to the value set in the n_samples parameter)
Sep_width_resamples <- Sep_Width %>% generate(n_samples = 20)
head(Sep_width_resamples)
The objective of the calculate function is to calculate a statistic for each of the resampled groups. Up until this version of release, only the 'mean' statistic is available to be calculated thus far.
Sep_width_means <- Sep_width_resamples %>% calculate(column="Sepal.Width",stat="mean")
Sep_width_means
The objective of the Get CI function is to find the confidence intervals of the the resampled groups. The user has the choice to set the level between 0 and 1 non-inclusive.
Sep_width_CI <- Sep_width_means %>% rfer::get_ci(column="Sepal.Width",confidence_level = 0.9)
Sep_width_CI
Note, the Point Estimate is N/A above. This is because it is not specified by the user and not required when the method to calculate is the percentile method. If specified, the value will be displayed in the output.
Ultimately the objective of the rfer package is to combine all of the above functions to result in a streamlined method to calculate confidence intervals (and eventually other estimates of interest) once a column/variable is specified. In the example below, we use the mtcars dataset (just to shake things up a little) and our objective is to arrive at the 90% confidence intervals of the hp across all the cars.
mtcars %>% specify(response = "hp") %>% generate(n_samples = 10,type = "bootstrap") %>% calculate(column = "hp",stat="mean") %>% rfer::get_ci(column = "hp",confidence_level = 0.9,point_estimate = hp_point_estimate,type="percentile")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.