samplingsimulatorr
is an R package intended to assist those teaching
or learning basic statistical inference.
| Name | GitHub | | ---------------- | ----------------------------------------------- | | Holly Williams | hwilliams10 | | Lise Braaten | lisebraaten | | Tao Guo | tguo9 | | Yue (Alex) Jiang | YueJiangMDSV |
This package allows users to generate virtual populations which can be sampled from in order to compare and contrast sample vs sampling distributions for different sample sizes. The package also allows users to sample from the generated virtual population (or any other population), plot the distributions, and view summaries for the parameters of interest.
You can install the development version of samplingsimulatorr from GitHub with:
# install.packages("devtools")
devtools::install_github("UBC-MDS/samplingsimulatorr")
generate_virtual_pop
creates a virtual population.rnorm
, rexp
, etc),
the parameters required by the distribution function, and the
size of the population.draw_samples
generates samples of different sizesplot_sample_hist
creates sample distributions for different sample
sizes.plot_sampling_dist
creates sampling distributions for different
sample sizes.draw_samples
function,
variable of interest, a vector of the sample sizes, and the
number of replication for each sample sizestat_summary
: returns a summary of the statistical parameters of
interestTo the best of our knowledge, there is currently no existing R package
with the specific functionality to create virtual populations and make
the specific sample and sampling distributions described above. We do
make use of many existing R packages and expand on them to make very
specific functions. These include: - built-in r distribution functions
such as rnorm
to sample from distributions - rep_sample_n
to
generate random samples - ggplot2
to create plots Python pandas
already includes some summary statistics functions such as .describe(),
however our package will be more customizable. Our summary will only
include the statistical parameters of interest and will provide a
comparison between the sample, sampling, and true population parameters.
generate_virtual_pop
library(samplingsimulatorr)
generate_virtual_pop(N, var_name, dist, ... )
Arguments:
N
: The number of samplesvar_name
: The variable name that we need to createdist
: The distribution that we are generating samples from...
: The arguments required for the distribution functionExample:
pop <- generate_virtual_pop(100, "height", rnorm, 0, 1)
draw_samples
library(samplingsimulatorr)
draw_samples(pop, reps, sample_size)
Arguments:
pop
the virtual population as a tibblereps
the number of replication for each sample size as an integer
valuesample_size
the sample size for each one of the samples as an
arrayExample:
samples <- draw_samples(pop, 3, c(1, 10))
plot_sample_hist
library(samplingsimulatorr)
plot_sample_hist(pop, samples, var_name, sample_size)
Arguments:
pop
the virtual population as a tibblesamples
the samples as a tibblevar_name
the name of the column for the variable that is being
generatedsample_size
a vector of the sample sizes (each sample size needs
to be in the samples
df input)Example:
plot_sample_hist(pop, samples, height, c(1, 10))
plot_sampling_hist
library(samplingsimulatorr)
plot_sampling_hist(samples, var_name, sample_size)
Arguments:
samples
the samples as a tibblevar_name
the name of the column for the variable that is being
generatedsample_size
a vector of the sample sizes (each sample size needs
to be in the samples
df input)Example:
plot_sampling_hist(samples, height, c(10, 50), 100)
stat_summary
library(samplingsimulatorr)
stat_summary(pop, samples, paramater)
Arguments
population
the virtual populationsamples
the drawn samplesparameter
the parameter(s) of interestExample
stat_summary(pop, samples, c(mean, median))
library(samplingsimulatorr)
# generate population
pop <- generate_virtual_pop(1000, "height", rnorm, 0, 1)
head(pop)
#> # A tibble: 6 x 1
#> height
#> <dbl>
#> 1 1.07
#> 2 0.707
#> 3 0.853
#> 4 -1.63
#> 5 -0.512
#> 6 -0.833
# create samples
samples <- draw_samples(pop, 100, c(1, 10, 50, 100))
head(samples)
#> # A tibble: 6 x 4
#> # Groups: replicate [6]
#> replicate height size rep_size
#> <int> <dbl> <dbl> <dbl>
#> 1 1 0.804 1 100
#> 2 2 -0.883 1 100
#> 3 3 -1.69 1 100
#> 4 4 0.000492 1 100
#> 5 5 1.17 1 100
#> 6 6 -0.738 1 100
# plot sample histogram
plot_sample_hist(pop, samples, height, c(10, 50, 100))
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plot_sampling_hist(samples, height, c(10, 50, 100))
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
The official documentation is hosted on pkgdown. You can also refer to our html vignette.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.