describe_ci | R Documentation |
describe_ci extends the functionality of stat_ci
by allowing you to obtain confidence intervals for a summary statistic
split by any number of grouping variables. Like stat_ci
you
can specify any function that operates on a numeric variable and returns a
single value (e.g. mean, median, sd, se, etc.), but unlike the other
elucidate package *_ci function this one always returns either a data.table
or tibble (instead of a named vector). Calculations of confidence intervals
for the mean are obtained based on reference to the theoretical
normal/gaussian distribution for speed, otherwise bootstrapping is used,
with options for multicore machines to use parallel processing, which can
speed things up quite a bit for larger samples. stat_ci
may
be useful instead of describe_ci if you need to pass additional arguments
to the chosen summary statistic function (which is what that function uses
the ... argument for). To get confidence intervals for all numeric
variables in a data frame, use describe_ci_all
instead.
describe_ci( data, y = NULL, ..., stat = mean, replicates = 2000, ci_level = 0.95, ci_type = c("perc", "bca", "basic", "norm"), parallel = FALSE, cores = NULL, na.rm = TRUE, output = c("dt", "tibble") )
data |
Either a numeric vector or a data frame or tibble containing the numeric vector ("y") to be described and any grouping variables ("..."). |
y |
If the data object is a data.frame, this is the variable for which you wish to obtain a descriptive summary |
... |
If the data object is a data.frame, this special argument accepts
any number of unquoted grouping variable names (also present in the data source)
to use for subsetting, separated by commas (e.g. |
stat |
the unquoted name (e.g. mean, not "mean") of a summary statistic function to calculate confidence intervals for. Only functions which return a single value and operate on numeric variables are currently supported. |
replicates |
The number of bootstrap replicates to use to construct confidence intervals for statistics other than the sample mean. Default is 2,000, as recommended by Efron & Tibshirani (1993). For publications, or if you need more precise estimates, more replications (e.g. >= 5,000) are recommended. N.B. more replications will of course take longer to run. If you get the error: "estimated adjustment 'a' is NA" when ci_type is set to "bca" then try again with more replications. |
ci_level |
The confidence level to use for constructing confidence
intervals. Default is set to |
ci_type |
The type of confidence intervals to calculate from the
bootstrap samples. Most of the options available in the underlying boot.ci
function are implemented (except for studentized intervals): "norm" for an
approximation based on the normal distribution, "perc" for percentile,
"basic" for basic, and "bca" for bias-corrected and accelerated. Percentile
intervals are the default since these are typically sufficient when working
with large data sets (e.g. >= 100,000 rows of data) and are faster to
calculate than BCa intervals. However, BCa intervals (the default for the
more primitive |
parallel |
set to TRUE if you want to use multiple cores or FALSE if you don't (the default). Note that there is some processing overhead involved when operating in parallel so speed gains may not be very noticeable for smaller samples (and may even take longer than sequential processing). Due to the nature of the underlying parallelization architecture, performance gains will likely be greater on non-Windows machines that can use the "multicore" implementation instead of "snow". For obvious reasons this option only works on machines with more than 1 logical processing core. |
cores |
If parallel is set to TRUE, this determines the number of cores to use. To see how many cores are available on your machine, use parallel::detectCores(). If cores is unspecified the number of available cores - 1 will be used by default. |
na.rm |
should missing values be removed before attempting to calculate the chosen statistic and confidence intervals? Default is TRUE. |
output |
Output type for each class of variables. dt" for data.table or "tibble" for tibble. |
Craig P. Hutton, craig.hutton@gov.bc.ca
Efron, B. (1987). Better bootstrap confidence intervals. Journal of the American statistical Association, 82(397), 171-185.
Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap. New York: Chapman & Hall.
boot
, boot.ci
,
mean_ci
, median_ci
, stat_ci
,
describe_ci_all
describe_ci(pdata, y1, stat = mean) #the default ## Not run: #using a single core (sequential processing) describe_ci(pdata[1:1000, ], y1, stat = median) #bootstrapped CIs for the median describe_ci(pdata, y1, high_low, stat = mean) #split by a grouping variable #using multiple cores (parallel processing) describe_ci(pdata[1:1000, ], y1, stat = sd, parallel = TRUE, cores = 2) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.