infer: Statistical Inference with Permutation

View source: R/infer.R

inferR Documentation

Statistical Inference with Permutation

Description

Statistical Inference with Permutation

Usage

infer(
  data,
  formula,
  stat,
  paired = NULL,
  reps = 2000L,
  level = 0.95,
  direction = "two-sided",
  null = NULL,
  type = NULL,
  null_reps = reps,
  ci_reps = reps,
  ...,
  variables = NULL,
  response = NULL,
  explanatory = NULL,
  success = NULL,
  p = NULL,
  mu = NULL,
  med = NULL,
  sigma = NULL
)

Arguments

data

A data frame that can be coerced into a tibble.

formula

A formula with the response variable on the left and the explanatory on the right. Alternatively, a response and explanatory argument can be supplied.

stat

A string giving the type of the statistic to calculate. Current options include "mean", "median", "sum", "sd", "prop", "count", "diff in means", "diff in medians", "diff in props", "Chisq" (or "chisq"), "F" (or "f"), "t", "z", "ratio of props", "slope", "odds ratio", "ratio of means", and "correlation". infer only supports theoretical tests on one or two means via the "t" distribution and one or two proportions via the "z".

paired

A column in the data to specify the observation (sample id), if not NULL, paired comparison will be applied.

reps

Fast way to set both null_reps and ci_reps.

level

A numerical value between 0 and 1 giving the confidence level. Default value is 0.95.

direction

A character string. Options are "less", "greater", or "two-sided". Can also use "left", "right", "both", "two_sided", or "two sided", "two.sided".

null

The null hypothesis. Options include "independence", "point", and "paired independence".

  • independence: Should be used with both a response and explanatory variable. Indicates that the values of the specified response variable are independent of the associated values in explanatory.

  • point: Should be used with only a response variable. Indicates that a point estimate based on the values in response is associated with a parameter. Sometimes requires supplying one of p, mu, med, or sigma.

  • ⁠paired independence⁠: Should be used with only a response variable giving the pre-computed difference between paired observations. Indicates that the order of subtraction between paired values does not affect the resulting distribution.

type

A string giving which method should be used for creating the confidence interval. The default is "percentile" with "se" corresponding to (multiplier * standard error) and "bias-corrected" for bias-corrected interval as other options.

null_reps

Number of times to calculate null distribution.

ci_reps

Number of bootstrap times to calculate confidence interval.

...

Other arguments passed to infer::calculate.

  • order: A string vector of specifying the order in which the levels of the explanatory variable should be ordered for subtraction (or division for ratio-based statistics), where order = c("first", "second") means ("first" - "second"), or the analogue for ratios. Needed for inference on difference in means, medians, proportions, ratios, t, and z statistics.

  • ...: To pass options like na.rm = TRUE into functions like mean(), sd(), etc. Can also be used to supply hypothesized null values for the "t" statistic or additional arguments to stats::chisq.test().

variables

A set of unquoted column names in the data to permute (independently of each other). Defaults to only the response variable. Note that any derived effects that depend on these columns (e.g., interaction effects) will also be affected.

response

The variable name in x that will serve as the response. This is an alternative to using the formula argument.

explanatory

The variable name in x that will serve as the explanatory variable. This is an alternative to using the formula argument.

success

The level of response that will be considered a success, as a string. Needed for inference on one proportion, a difference in proportions, and corresponding z stats.

p

The true proportion of successes (a number between 0 and 1). To be used with point null hypotheses when the specified response variable is categorical.

mu

The true mean (any numerical value). To be used with point null hypotheses when the specified response variable is continuous.

med

The true median (any numerical value). To be used with point null hypotheses when the specified response variable is continuous.

sigma

The true standard deviation (any numerical value). To be used with point null hypotheses.

Value

A data.frame

See Also

https://infer.tidymodels.org/articles/observed_stat_examples.html

Examples

data(gss, package = "infer")
infer(gss, response = hours, stat = "mean", mu = 40)
infer(gss, response = hours, stat = "t", mu = 40)
infer(gss, response = hours, stat = "median", med = 40)
infer(gss, response = sex, success = "female", stat = "prop", p = .5)
infer(gss, response = sex, success = "female", stat = "z", p = .5)
infer(gss, college ~ sex,
    success = "no degree",
    stat = "diff in props",
    order = c("female", "male")
)
infer(gss, hours ~ age + college, variables = c(age, college))
gss$hours_previous <- gss$hours + 5 - rpois(nrow(gss), 4.8)
gss$.id <- seq_len(nrow(gss))
gss_paired <- tidyr::pivot_longer(gss, cols = c(hours, hours_previous))
infer(gss_paired, value ~ name,
    stat = "mean", paired = .id
)

Yunuuuu/yjtools documentation built on Jan. 29, 2024, 5:30 a.m.