infer: Statistical Inference with Permutation
In Yunuuuu/yjtools: Toolkit for Yun and Jin (yjtools),

infer

R Documentation

Statistical Inference with Permutation

Description

Statistical Inference with Permutation

Usage

infer(
  data,
  formula,
  stat,
  paired = NULL,
  reps = 2000L,
  level = 0.95,
  direction = "two-sided",
  null = NULL,
  type = NULL,
  null_reps = reps,
  ci_reps = reps,
  ...,
  variables = NULL,
  response = NULL,
  explanatory = NULL,
  success = NULL,
  p = NULL,
  mu = NULL,
  med = NULL,
  sigma = NULL
)

Arguments

`data`	A data frame that can be coerced into a tibble.
`formula`	A formula with the response variable on the left and the explanatory on the right. Alternatively, a `response` and `explanatory` argument can be supplied.
`stat`	A string giving the type of the statistic to calculate. Current options include `"mean"`, `"median"`, `"sum"`, `"sd"`, `"prop"`, `"count"`, `"diff in means"`, `"diff in medians"`, `"diff in props"`, `"Chisq"` (or `"chisq"`), `"F"` (or `"f"`), `"t"`, `"z"`, `"ratio of props"`, `"slope"`, `"odds ratio"`, `"ratio of means"`, and `"correlation"`. `infer` only supports theoretical tests on one or two means via the `"t"` distribution and one or two proportions via the `"z"`.
`paired`	A column in the data to specify the observation (sample id), if not `NULL`, paired comparison will be applied.
`reps`	Fast way to set both `null_reps` and `ci_reps`.
`level`	A numerical value between 0 and 1 giving the confidence level. Default value is 0.95.
`direction`	A character string. Options are `"less"`, `"greater"`, or `"two-sided"`. Can also use `"left"`, `"right"`, `"both"`, `"two_sided"`, or `"two sided"`, `"two.sided"`.
`null`	The null hypothesis. Options include `"independence"`, `"point"`, and `"paired independence"`. `independence`: Should be used with both a `response` and `explanatory` variable. Indicates that the values of the specified `response` variable are independent of the associated values in `explanatory`. `point`: Should be used with only a `response` variable. Indicates that a point estimate based on the values in `response` is associated with a parameter. Sometimes requires supplying one of `p`, `mu`, `med`, or `sigma`. `⁠paired independence⁠`: Should be used with only a `response` variable giving the pre-computed difference between paired observations. Indicates that the order of subtraction between paired values does not affect the resulting distribution.
`type`	A string giving which method should be used for creating the confidence interval. The default is `"percentile"` with `"se"` corresponding to (multiplier * standard error) and `"bias-corrected"` for bias-corrected interval as other options.
`null_reps`	Number of times to calculate null distribution.
`ci_reps`	Number of bootstrap times to calculate confidence interval.
`...`	Other arguments passed to infer::calculate. `order`: A string vector of specifying the order in which the levels of the explanatory variable should be ordered for subtraction (or division for ratio-based statistics), where `order = c("first", "second")` means `("first" - "second")`, or the analogue for ratios. Needed for inference on difference in means, medians, proportions, ratios, t, and z statistics. `...`: To pass options like `na.rm = TRUE` into functions like `mean()`, `sd()`, etc. Can also be used to supply hypothesized null values for the "t" statistic or additional arguments to `stats::chisq.test()`.
`variables`	A set of unquoted column names in the data to permute (independently of each other). Defaults to only the response variable. Note that any derived effects that depend on these columns (e.g., interaction effects) will also be affected.
`response`	The variable name in `x` that will serve as the response. This is an alternative to using the `formula` argument.
`explanatory`	The variable name in `x` that will serve as the explanatory variable. This is an alternative to using the formula argument.
`success`	The level of `response` that will be considered a success, as a string. Needed for inference on one proportion, a difference in proportions, and corresponding z stats.
`p`	The true proportion of successes (a number between 0 and 1). To be used with point null hypotheses when the specified response variable is categorical.
`mu`	The true mean (any numerical value). To be used with point null hypotheses when the specified response variable is continuous.
`med`	The true median (any numerical value). To be used with point null hypotheses when the specified response variable is continuous.
`sigma`	The true standard deviation (any numerical value). To be used with point null hypotheses.

Value

A data.frame

Examples

data(gss, package = "infer")
infer(gss, response = hours, stat = "mean", mu = 40)
infer(gss, response = hours, stat = "t", mu = 40)
infer(gss, response = hours, stat = "median", med = 40)
infer(gss, response = sex, success = "female", stat = "prop", p = .5)
infer(gss, response = sex, success = "female", stat = "z", p = .5)
infer(gss, college ~ sex,
    success = "no degree",
    stat = "diff in props",
    order = c("female", "male")
)
infer(gss, hours ~ age + college, variables = c(age, college))
gss$hours_previous <- gss$hours + 5 - rpois(nrow(gss), 4.8)
gss$.id <- seq_len(nrow(gss))
gss_paired <- tidyr::pivot_longer(gss, cols = c(hours, hours_previous))
infer(gss_paired, value ~ name,
    stat = "mean", paired = .id
)

Yunuuuu/yjtools documentation built on Jan. 29, 2024, 5:30 a.m.