README.md

samplingsimulatorr

R build
status

codecov

samplingsimulatorr is an R package intended to assist those teaching or learning basic statistical inference.

Authors

| Name | GitHub | | ---------------- | ----------------------------------------------- | | Holly Williams | hwilliams10 | | Lise Braaten | lisebraaten | | Tao Guo | tguo9 | | Yue (Alex) Jiang | YueJiangMDSV |

Overview

This package allows users to generate virtual populations which can be sampled from in order to compare and contrast sample vs sampling distributions for different sample sizes. The package also allows users to sample from the generated virtual population (or any other population), plot the distributions, and view summaries for the parameters of interest.

Installation

You can install the development version of samplingsimulatorr from GitHub with:

# install.packages("devtools")
devtools::install_github("UBC-MDS/samplingsimulatorr")

Function Descriptions

How do these fit into the R ecosystem?

To the best of our knowledge, there is currently no existing R package with the specific functionality to create virtual populations and make the specific sample and sampling distributions described above. We do make use of many existing R packages and expand on them to make very specific functions. These include: - built-in r distribution functions such as rnorm to sample from distributions - rep_sample_n to generate random samples - ggplot2 to create plots Python pandas already includes some summary statistics functions such as .describe(), however our package will be more customizable. Our summary will only include the statistical parameters of interest and will provide a comparison between the sample, sampling, and true population parameters.

Dependencies

Usage

generate_virtual_pop

library(samplingsimulatorr)
generate_virtual_pop(N, var_name, dist, ... )

Arguments:

Example:

pop <- generate_virtual_pop(100, "height", rnorm, 0, 1)

draw_samples

library(samplingsimulatorr)
draw_samples(pop, reps, sample_size)

Arguments:

Example:

samples <- draw_samples(pop, 3, c(1, 10))

plot_sample_hist

library(samplingsimulatorr)
plot_sample_hist(pop, samples, var_name, sample_size)

Arguments:

Example:

plot_sample_hist(pop, samples, height, c(1, 10))

plot_sampling_hist

library(samplingsimulatorr)
plot_sampling_hist(samples, var_name, sample_size)

Arguments:

Example:

plot_sampling_hist(samples, height, c(10, 50), 100)

stat_summary

library(samplingsimulatorr)
stat_summary(pop, samples, paramater)

Arguments

Example

stat_summary(pop, samples, c(mean, median))

Example Usage Scenario

library(samplingsimulatorr)

# generate population
pop <- generate_virtual_pop(1000, "height", rnorm, 0, 1)
head(pop)
#> # A tibble: 6 x 1
#>   height
#>    <dbl>
#> 1  1.07 
#> 2  0.707
#> 3  0.853
#> 4 -1.63 
#> 5 -0.512
#> 6 -0.833
# create samples
samples <- draw_samples(pop, 100, c(1, 10, 50, 100))
head(samples)
#> # A tibble: 6 x 4
#> # Groups:   replicate [6]
#>   replicate    height  size rep_size
#>       <int>     <dbl> <dbl>    <dbl>
#> 1         1  0.804        1      100
#> 2         2 -0.883        1      100
#> 3         3 -1.69         1      100
#> 4         4  0.000492     1      100
#> 5         5  1.17         1      100
#> 6         6 -0.738        1      100
# plot sample histogram
plot_sample_hist(pop, samples, height, c(10, 50, 100))
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

plot_sampling_hist(samples, height, c(10, 50, 100))
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Documentation

The official documentation is hosted on pkgdown. You can also refer to our html vignette.



tguo9/samplingsimulatorr documentation built on May 5, 2020, 12:10 a.m.