assume | R Documentation |
This function allows the user to define a null distribution based on
theoretical methods. In many infer pipelines, assume()
can be
used in place of generate()
and calculate()
to create a null
distribution. Rather than outputting a data frame containing a
distribution of test statistics calculated from resamples of the observed
data, assume()
outputs a more abstract type of object just containing
the distributional details supplied in the distribution
and df
arguments.
However, assume()
output can be passed to visualize()
, get_p_value()
,
and get_confidence_interval()
in the same way that simulation-based
distributions can.
To define a theoretical null distribution (for use in hypothesis testing),
be sure to provide a null hypothesis via hypothesize()
. To define a
theoretical sampling distribution (for use in confidence intervals),
provide the output of specify()
. Sampling distributions (only
implemented for t
and z
) lie on the scale of the data, and will be
recentered and rescaled to match the corresponding stat
given in
calculate()
to calculate the observed statistic.
assume(x, distribution, df = NULL, ...)
x |
The output of |
distribution |
The distribution in question, as a string. One of
|
df |
Optional. The degrees of freedom parameter(s) for the |
... |
Currently ignored. |
Note that the assumption being expressed here, for use in theory-based
inference, only extends to distributional assumptions: the null
distribution in question and its parameters. Statistical inference with
infer, whether carried out via simulation (i.e. based on pipelines
using generate()
and calculate()
) or theory (i.e. with assume()
),
always involves the condition that observations are independent of
each other.
infer
only supports theoretical tests on one or two means via the
t
distribution and one or two proportions via the z
.
For tests comparing two means, if n1
is the group size for one level of
the explanatory variable, and n2
is that for the other level, infer
will recognize the following degrees of freedom (df
) arguments:
min(n1 - 1, n2 - 1)
n1 + n2 - 2
The "parameter"
entry of the analogous stats::t.test()
call
The "parameter"
entry of the analogous stats::t.test()
call with var.equal = TRUE
By default, the package will use the "parameter"
entry of the analogous
stats::t.test()
call with var.equal = FALSE
(the default).
An infer theoretical distribution that can be passed to helpers
like visualize()
, get_p_value()
, and get_confidence_interval()
.
# construct theoretical distributions ---------------------------------
# F distribution
# with the `partyid` explanatory variable
gss %>%
specify(age ~ partyid) %>%
assume(distribution = "F")
# Chi-squared goodness of fit distribution
# on the `finrela` variable
gss %>%
specify(response = finrela) %>%
hypothesize(null = "point",
p = c("far below average" = 1/6,
"below average" = 1/6,
"average" = 1/6,
"above average" = 1/6,
"far above average" = 1/6,
"DK" = 1/6)) %>%
assume("Chisq")
# Chi-squared test of independence
# on the `finrela` and `sex` variables
gss %>%
specify(formula = finrela ~ sex) %>%
assume(distribution = "Chisq")
# T distribution
gss %>%
specify(age ~ college) %>%
assume("t")
# Z distribution
gss %>%
specify(response = sex, success = "female") %>%
assume("z")
## Not run:
# each of these distributions can be passed to infer helper
# functions alongside observed statistics!
# for example, a 1-sample t-test -------------------------------------
# calculate the observed statistic
obs_stat <- gss %>%
specify(response = hours) %>%
hypothesize(null = "point", mu = 40) %>%
calculate(stat = "t")
# construct a null distribution
null_dist <- gss %>%
specify(response = hours) %>%
assume("t")
# juxtapose them visually
visualize(null_dist) +
shade_p_value(obs_stat, direction = "both")
# calculate a p-value
get_p_value(null_dist, obs_stat, direction = "both")
# or, an F test ------------------------------------------------------
# calculate the observed statistic
obs_stat <- gss %>%
specify(age ~ partyid) %>%
hypothesize(null = "independence") %>%
calculate(stat = "F")
# construct a null distribution
null_dist <- gss %>%
specify(age ~ partyid) %>%
assume(distribution = "F")
# juxtapose them visually
visualize(null_dist) +
shade_p_value(obs_stat, direction = "both")
# calculate a p-value
get_p_value(null_dist, obs_stat, direction = "both")
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.