In mahendra-mariadassou/learningr: Introduction to R for FdV bachelor students

library(learnr)
library(tidyverse)
set.seed(42)
can_container <- rnorm(1e6, mean = 330, sd = 15)
checker <- function(label, user_code, check_code, envir_result, evaluate_result, ...) {
  list(message = check_code, correct = TRUE, location = "append")
}
compute_estimate <- function(n) {
  can_sample <- sample(x = can_container, size = n, replace = FALSE)
  estimate <- mean(can_sample)
  return(estimate)
}
estimates_data <- crossing(sample_size = c(10, 50, 100, 500, 1000, 5000), 
                           estimate_number = 1:100) %>% 
  mutate(estimate_value = map_dbl(sample_size, compute_estimate))
plot_ci <- function(n_estimates = 100, sample_size = 500, alpha = 0.05) {
  t_alpha <- qt(1 - alpha/2, df = sample_size - 1)
  tibble(estimate_nb = 1:n_estimates) %>% 
    mutate(sample    = map(estimate_nb, ~ sample(can_container, sample_size)), 
           mu_hat    = map_dbl(sample, mean), 
           sigma_hat = map_dbl(sample, sd), 
           ymin      = mu_hat - t_alpha * sigma_hat / sqrt(sample_size), 
           ymax      = mu_hat + t_alpha * sigma_hat / sqrt(sample_size), 
           covers    = if_else(ymin <= 330 & ymax >= 330, TRUE, FALSE)) %>% 
    select(-sample) %>% 
    ggplot(aes(x = estimate_nb, y = mu_hat, 
               ymin = ymin, ymax = ymax, color = covers)) + 
    geom_point() + 
    geom_linerange() +
    geom_hline(yintercept = 330, color = "red") + 
    scale_color_manual(values = c("TRUE" = "black", "FALSE" = "darkred"), name = expression(CI(1-alpha)~covers~mu)) + 
    labs(x = "Repetition", 
         y = expression(hat(mu)%+-%frac(hat(sigma), sqrt(n))*t[list(n-1,1-alpha/2)])) +
    theme(legend.background = element_rect(fill = "transparent"), 
          legend.position = "bottom") + 
    ggtitle(bquote(.(n_estimates) ~ "CI at level 1-" ~ alpha== .(1-alpha) ~ "for samples of size" ~ .(sample_size)))
}
tutorial_options(exercise.timelimit = 60, exercise.checker = checker)
knitr::opts_chunk$set(echo = FALSE)

Welcome

In this case study, you will investigate the properties of sampling using soda cans as an examples. The goal is to estimate the container capacity (advertised to be 330 mL) of a can, noted $\mu$. To do so, we have access to a container filled 1,000,000 cans (stored as can_container). Each element of can_container is the capacity of a different can.

We could get a very good estimate $\hat{\mu}$ of $\mu$ by measuring the capacity of all cans ($\Leftrightarrow$ accessing all values of can_container). Unfortunately, opening a can to measure its capacity means we can't sell it anymore...

We'll thus get a feeling of the sample size (usually called $n$) required to get a good estimate: hopefully, $n \ll 1,000,000$.

Estimator

In this study, we'll use a very simple estimator, introduced in Tristan's slides: the sample average (which turns out to also be the Maximum Likelihood estimator in this problem).

To be a bit a formal, let $Y$ be the capacity of a can and $Y_1, \dots, Y_n$ a random sample of size $n$ (obtained here from Simple Random Sampling) of capacities. You can create $Y_1, \dots, Y_n$ by picking $n$ cans at random from the container and measuring their capacity. We are interested in the average capacity $\mu = E[Y]$ and will estimate $\mu$ using the estimator $\bar{Y} = \frac{1}{n} \sum_{1}^n Y_i$ to get the corresponding estimate $\hat{\mu} = \bar{y} = \frac{1}{n} \sum_{1}^n y_i$

Computing an estimate

Write a function compute_estimate that create a sample $(y_1, \dots, y_n)$ and computes the corresponding estimate.

compute_estimate <- function(n) {
  can_sample <- ### complete here
  estimate <- ### complete here
  return(estimate)
}

"Use the functions `sample()` and `mean()`"

compute_estimate <- function(n) {
  can_sample <- sample(x = can_container, size = n, replace = FALSE)
  estimate <- mean(can_sample)
  return(estimate)
}

Looking at the estimates

Use your function to get estimates $\hat{\mu}$ of $\mu$ for various sample sizes (ranging from $n = 10$ to $n = 10000$). Get more than one estimate for each sample size.

compute_estimate <- function(n) {
  can_sample <- sample(x = can_container, size = n, replace = FALSE)
  estimate <- mean(can_sample)
  return(estimate)
}

compute_estimate(n = 10)

replicate(7, compute_estimate(n = 10)) ## call compute_estimate(10) 7 times
replicate(7, compute_estimate(n = 100)) ## call compute_estimate(100) 7 times
replicate(7, compute_estimate(n = 1000)) ## call compute_estimate(1000) 7 times

question(
  "What happens to $\\hat{\\mu}$ when $n$ increases?",
  answer("The estimates are more dispersed.", correct = FALSE, message = "Not really, take a closer look at the rang of estimates obtained for a given sample sizes"),
  answer("The estimates are less dispersed.", correct = TRUE, message = "Yes, different estimates become similar when n increases"),
  allow_retry = TRUE
)

Creating many estimates

We just illustrated something you've seen in class, the estimate depends on the sample. Since compute_estimate creates a new sample each times it's called, it will never (or only rarely) produce exactly the same estimate.

Our estimator $\bar{Y}$ is a random variable, if we had access to its distribution we could compute its bias and and variance. Here we don't know (yet) the exact distribution of $\bar{Y}$ yet. We're thus going to look only at its distribution instead by creating many estimates (say 100, numbered from 1 to 100) for each sample size in ${10, 50, 100, 500, 1000, 5000}$.

We first create a tibble with all combination and then compute an estimate with our function compute_estimate for each combination. The syntax is a bit involved but don't worry too much about it as it's not the main purpose.

estimates_data <- crossing(sample_size = c(10, 50, 100, 500, 1000, 5000), 
                           estimate_number = 1:100) %>% 
  mutate(estimate_value = map_dbl(sample_size, compute_estimate))
estimates_data

We just created 100 estimates for each of the 6 sample sizes considered.

Distibution of the estimates

You're now going to explore the distribution of those estimates. Use ggplot to explore the distribution of the data, make sure to separate the estimates by sample size.

ggplot(estimates_data, ...)

"Use `geom_density()` or `geom_histogram()` and `facet_wrap()`"

ggplot(estimates_data, aes(x = estimate_value)) + 
  geom_histogram(binwidth = 0.5) + 
  geom_vline(xintercept = 330, color = "red") + ## advertised capacity
  facet_wrap(~sample_size) + ## separate the data by sample_size
  labs(x = "Estimate", y = "Density") ## Nice labels

Characterizing the distribution

The previous graph illustrates a very general result in statistics: the larger the sample, the more accurate the estimate. There are of course exceptions (especially if the estimator is biased) but as a good rule of thumbs, more is better. Let's characterize the distribution of the estimates for each class size.

Compute the minimium, maximum, average and standard deviation of the estimate values for each sample size.

estimates_data %>% ...

"Use `group_by()` and `summarize()`"

estimates_summary <- estimates_data %>% 
  group_by(sample_size) %>% 
  summarize(estimate_sd   = sd(estimate_value), 
            estimate_mean = mean(estimate_value),
            estimate_min  = min(estimate_value), 
            estimate_max  = max(estimate_value))
estimates_summary

"Good job! Note that the range and standard deviation of the estimate values decrease with sample size."

Understanding the distribution

The previous graphs and tables tell us that:

the estimates are centered around the same value (~ 330) for all sample sizes
the spread of the estimates decreases as sample size increases.

Indeed when using only 10 cans to assess the capacity, the estimate $\hat{\mu}$ can be anywhere between 320 and 340 whereas it is very concentrated around 330 when 5000 cans.

Using the previous results, find a scaling relation between estimate_sd and sample_size

estimates_summary <- estimates_data %>% 
  group_by(sample_size) %>% 
  summarize(estimate_sd   = sd(estimate_value), 
            estimate_mean = mean(estimate_value),
            estimate_min  = min(estimate_value), 
            estimate_max  = max(estimate_value))

"Try a scatter plot of `estimate_sd` against `sample_size`"

"Maybe on a log scale"

ggplot(estimates_summary, aes(x = log(sample_size), y = log(estimate_sd))) + 
  geom_point()

question(
  "What is the scaling of `estimate_sd` with respect to $n$?", 
  answer("It increases with $n$"), 
  answer("It decreases with $n$", correct = TRUE),
  answer("It decreases as $1/n$", message = "Check carefully the scale of the x- and y-axes"), 
  answer("It decreases as $1/\\sqrt{n}$", correct = TRUE), 
  allow_retry = TRUE
  )

question(
  "What sample size would you use to get an estimate with accuracy 1", 
  answer("10"), 
  answer("100"),
  answer("500", message = "Not quite, since the standard deviation is ~ 0.66, the typical spread will be at least 1.32 and you won't get accuracy 1"),
  answer("1000", correct = TRUE, message = "Since the standard deviation is ~ 0.45, the typical spread will be 0.9, lower than 1"), 
  answer("5000", correct = TRUE, message = "Since the standard deviation is ~ 0.208, the typical spread will be 0.42, lower than 1"), 
  allow_retry = TRUE
  )

Bias and Variance

To go further and compute the bias and variance of our estimate, we need to know the theoretical distribution of our estimate.

Assume that the can capacitites are independant and follow a normal distribution $\mathcal{N}(\mu, \sigma^2)$ with mean $\mu$ and variance $\sigma^2$.

The theory informs us that

$$ \bar{Y} = \frac{1}{n} \sum_{1}^n Y_i \sim \mathcal{N}\left(\mu, \frac{\sigma^2}{n}\right) $$

In particular, $E[\bar{Y}] = \mu$ and $V[\bar{Y}] = \frac{\sigma^2}{n}$.

As $n$ increases, the variance of $\bar{Y}$ decreases as $\frac{1}{n}$ and thus is standard deviation as $\frac{1}{\sqrt{n}}$. This is in line with our findings from the previous section.

The formula also tells us that $\bar{Y} \to \mu$ when $n \to \infty$.

Convergence of $\bar{Y}$

We're going to investigate this convergence by looking at estimates computed on increasingly large samples: the first 10 cans, the first 11 cans, etc.

Write a function to compute the capacity estimate from the first $n$ cans in can_container (and not from a random sample):

compute_estimate_2 <- function(n) {
  estimate <- ### complete here
  return(estimate)
}

compute_estimate_2 <- function(n) {
  estimate <- mean(can_container[1:n])
  return(estimate)
}

You can then look at the evolution of the estimate when you increase the sample size from $n = 1$ to $n = 50,000$ (there is no real point looking at larger sample sizes).

compute_estimate_2 <- function(n) {
  estimate <- mean(can_container[1:n])
  return(estimate)
}

estimates_convergence <- tibble(sample_size = seq(from = 10, to = 50000, by = 1)) %>%
  mutate(estimate_value = map_dbl(sample_size, compute_estimate_2))
estimates_convergence

estimates_convergence <- tibble(sample_size = seq(from = 10, to = 50000, by = 1)) %>%
  mutate(estimate_value = cumsum(can_container)[sample_size] / sample_size)
estimates_convergence

And plot the convergence of $\hat{\mu}$ to $\mu$ as a $n \to \infty$.

ggplot(...)

ggplot(estimates_convergence, aes(x = sample_size, y = estimate_value)) + 
  geom_line() +
  geom_hline(yintercept = 330, color = "red")

Confidence interval

In this section, we'll illustrate a few properties of confidence intervals.

Known variance

Assume for now that $\sigma^2$ is knonw. We learned in the previous section that $\bar{Y} \sim \mathcal{N}(\mu, \frac{\sigma^2}{n})$ or equivalently that the pivotal statistic $$ Z = \frac{\bar{Y} -\mu}{\sigma/\sqrt{n}} \sim \mathcal{N}(0, 1) $$

In particular, the distribution of $Z$ depends on neither $\mu$ nor $\sigma$ and can be used to construct confidence intervals (see Tristan's slides for details).

In that simple case, to compute a confidence interval at level $1 - \alpha$, we simply need to find $a$ and $b$ such that $$ P(a \leq Z \leq b) = 1 - \alpha $$

You've seen in class that the optimal solution was:

$a = q_{\alpha/2}$
$b = q_{1 - \alpha/2}$

where $q_\beta$ is the quantile or order $\beta$ of the normal distribution, computed with the function qnorm().

In fact, for every $\beta \leq \alpha$ we could set $a = q_{\beta}$ and $b = q_{1 - \alpha + \beta}$ to obtain a solution as: $$ P(a \leq Z \leq b) = P(Z \leq q_{1 - \alpha + \beta}) - P(Z \leq q_{\beta}) = (1 - \alpha + \beta) - \beta = 1 - \alpha $$ The condition $\beta \leq \alpha$ ensures that $1 - \alpha + \beta \leq 1$ and that the corresponding quantile is properly defined and the *optimalù choice corresponds to $\beta = \alpha/2$.

But why is it optimal? The function plot_norm_interval() helps you visualize the interval $[q_{\beta}, q_{1 - \alpha + \beta}]$ for different values of $\beta$.

plot_norm_interval <- function(beta, alpha = 0.05) {
  a <- qnorm(beta, mean = 0, sd = 1) ## q_beta
  b <- qnorm(1 - alpha + beta, mean = 0, sd = 1) ## q_{1 - alpha + beta}
  x <- c(a, b, seq(min(-5, a), max(5, b), length.out = 101)) %>% unique() %>% sort()
  df <- tibble(x = x, 
               d = dnorm(x), 
               side = case_when(
                 x <= a ~ "left", 
                 x >= b ~ "right",
                 TRUE   ~ "center"))
  ggplot(df, aes(x = x, y = d)) + 
    geom_area(aes(y = d, fill = side)) + 
    scale_fill_manual(values = c(left = "red", center = "transparent", right = "darkred"), guide = "none") + 
    annotate(x = a, y = dnorm(a), hjust = 0.75, vjust = 0, geom = 'text', label = "beta", parse = TRUE, color = "red") + 
    annotate(x = b, y = dnorm(b), hjust = 0.25, vjust = 0, geom = 'text', label = "1 - alpha + beta", parse = TRUE, color = "darkred") + 
    geom_segment(aes(x = a, xend = b, y = 0, yend = 0), color = "darkgreen") + 
    annotate(x = (a+b)/2, y = 0, hjust = 0.5, vjust = 0, geom = 'text', label = 'group("[",list(q[beta], q[1-alpha+beta]),"]")', parse = TRUE, color = "darkgreen") +
    annotate(x = -Inf, y = Inf, geom = "text", hjust = 0, vjust = 1,
             label = paste0("q[1-alpha+beta] - q[beta] ==", round(b-a, digits = 3)), parse = TRUE) + 
    geom_line()
}

plot_norm_interval(beta = 0.025, alpha = 0.05)

question("How is the value $\\beta = \\alpha/2$ optimal?", 
         answer("It leads to a symmetric interval $[a, b]$", message = "It's true here but it's not always the case"), 
         answer("It leads to the smallest interval $[a, b]$", correct = TRUE), 
         allow_retry = TRUE
         )

Unknown variance

In the previous section, the pivotal statistic $Z$ depends on $\sigma$ being known, but $\sigma$ is usually unknown in practice...

We can replace it with the classical estimator $S^2 = \frac{1}{n-1}\sum_{i=1}^{n}(Y_i - \bar{Y})^2$ and plug it in the definition of $Z$ to obtain a new statistic. $$ T = \frac{\bar{Y} - \mu}{S / \sqrt{n}} $$

But is it still a pivotal statistic? The surprising answer is yes. $$ \begin{align} T & = \frac{\bar{Y} - \mu}{S / \sqrt{n}} = \frac{\bar{Y} - \mu}{\sigma / \sqrt{n}} \times \frac{\sigma}{S} \ & = \underbrace{\frac{\bar{Y} - \mu}{\sigma / \sqrt{n}}}{\sim \mathcal{N}(0, 1)} \times \left( \frac{1}{n-1}\underbrace{\frac{\sum{i=1}^n (Y_i - \bar{Y})^2}{\sigma^2}}_{\sim \chi^2(n-1)} \right)^{-1/2} \sim \frac{\mathcal{N}(0, 1)}{\sqrt{\frac{\chi^2(n-1)}{n-1}}} \sim \mathcal{T}(n-1) \end{align} $$

Thanks to the properties of independent $\chi^2$ and gaussian variables (we've not properly shown that $S$ and $(\bar{Y} - \mu)/\sigma$ are independent, you'll have to trust me on that one), our statistic is pivotal and follows a Student's t-distribution with $n-1$ degrees of freedom.

Let's compare the $\mathcal{N}(0, 1)$ and $\mathcal{T}(n-1)$ distributions for various values of $n$.

df <- tibble(x    = seq(-4, 4, length.out = 1000), 
             `N(0, 1)` = dnorm(x), 
             `T(1)`  = dt(x, df = 1), 
             `T(2)`  = dt(x, df = 2), 
             `T(5)`  = dt(x, df = 5), 
             `T(10)`  = dt(x, df = 10), 
             `T(100)`  = dt(x, df = 100)) %>% 
  pivot_longer(-x, names_to = "distribution", values_to = "d") %>% 
  mutate(distribution = factor(distribution, 
                               levels = rev(c(paste0("T(", c(1, 2, 5, 10, 100), ")"), "N(0, 1)"))))
ggplot(df, aes(x = x, y = d, color = distribution)) + 
  geom_line() + 
  scale_color_brewer() + 
  theme_dark() +
  theme(legend.position = c(0, 1), legend.justification = c(0, 1), 
        legend.background = element_rect(fill = "transparent"), 
        legend.key = element_rect(fill = "transparent"))

question("Which statements are true?", 
         answer("The variance of $\\mathcal{T}(n)$ increases when $n$ increases.",
                message = "Really? Take a closer look."),
         answer("The distribution of $\\mathcal{T}(n)$ is symmetric.", 
                correct = TRUE), 
         answer("The distribution of $\\mathcal{T}(n)$ converges to the one of $\\mathcal{N}(0, 1)$ as $n$ increases.", 
                correct = TRUE), 
         answer("Confidence intervals of $\\mathcal{T}(n)$ are smaller than the same ones for $\\mathcal{N}(0, 1)$", message = "Really? Take a closer look."), 
         allow_retry = TRUE, 
         post_message = "You can think of $\\mathcal{T}(n)$ as a noisy version of $\\mathcal{N}(0,1)$ with a larger spread (and higher variance). Since its variance is larger, it has wider confidence interval. However, as $n$ increases, $\\mathcal{T}(n)$ converges to $\\mathcal{N}(0,1)$ and at the limit we have $\\mathcal{T}(\\infty) = \\mathcal{N}(0,1)$"
         )

The quantile of Student's t distribution are computed with qt(). It works like qnorm() but requires an extra parameter df (degree of freedom).

Compute the quantile of order 0.975 of a $\mathcal{T}(2)$ random variable:

"Use qt()"

qt(p = 0.975, df = 2)

"Well done! You now have the knowledge required to compute confidence intervals. Note the quantile for $\\mathcal{T}(2)$ is 4.30, more than twice as high as the same quantile for $\\mathcal{N}(0, 1)$ and will result in much wider confidence intervals.

Understanding confidence intervals (I)

In the previous section, we learned how to replace $\sigma^2$ with its estimator to obtain a pivotal statistics. If we note $t_{n, \beta}$ the quantile of order $\beta$ of $\mathcal{T}(n)$. We can build a confidence interval of level $1-\alpha$ for $\mu$ as follows:

$$ \begin{align} & P\left(t_{n-1, \alpha/2} \leq T \leq t_{n-1, 1 - \alpha/2} \right) = 1 - \alpha \ \Rightarrow & P\left(t_{n-1, \alpha/2} \leq \frac{\bar{Y} - \mu}{S/\sqrt{n}} \leq t_{n-1, 1 - \alpha/2} \right) = 1 - \alpha \ \Rightarrow & P\left( \frac{S}{\sqrt{n}}t_{n-1, \alpha/2} \leq \bar{Y} - \mu \leq \frac{S}{\sqrt{n}} t_{n-1, 1 - \alpha/2} \right) = 1 - \alpha \ \Rightarrow & P\left( \frac{S}{\sqrt{n}}t_{n-1, \alpha/2} - \bar{Y} \leq -\mu \leq \frac{S}{\sqrt{n}} t_{n-1, 1 - \alpha/2} - \bar{Y} \right) = 1 - \alpha \ \Rightarrow & P\left( \frac{S}{\sqrt{n}}t_{n-1, 1 - \alpha/2} + \bar{Y} \leq \mu \leq - \frac{S}{\sqrt{n}} t_{n-1, \alpha/2} + \bar{Y} \right) = 1 - \alpha \ \end{align} $$ Since $\mathcal{T}(n-1)$ has a symmetric distribution, $t_{n-1, \frac{\alpha}{2}} = - t_{n-1, 1 - \frac{\alpha}{2}}$ and the previous equality is often written $$ P\left( \mu \in \left[ \bar{Y} \pm \frac{S}{\sqrt{n}} t_{n-1, 1 - \frac{\alpha}{2}} \right] \right) = 1 - \alpha $$

Consider the sample of the first $1000$ cans: can_sample <- can_container[1:1000]. Build a 95% confidence interval of the can capacity for that sample.

can_sample <- can_container[1:1000]

"Start with an estimate $\\hat{\\mu}$ of $\\mu$ and $\\hat{\\sigma}$ of $\\sigma$"

can_sample <- can_container[1:1000]
mu <- mean(can_sample)
sigma <- sd(can_sample)
t_alpha2 <- qt(0.975, df = 999)

can_sample <- can_container[1:1000]
mu <- mean(can_sample)
sigma <- sd(can_sample)
t_alpha2 <- qt(0.975, df = 999)
## lower bound
mu - sigma * t_alpha2 / sqrt(1000)
## upper bound
mu + sigma * t_alpha2 / sqrt(1000)

Understanding confidence intervals (II)

Confidence interval are a bit tricky to understand:

the interval $CI(1-\alpha) = \left[ \bar{Y} \pm \frac{S}{\sqrt{n}} t_{n-1, 1 - \frac{\alpha}{2}} \right]$ is a random variable
but $\mu$ is a real number

Therefore, when replacing $\bar{Y}$ and $S$ with $\hat{\mu}$ and $\hat{\sigma}$ to compute the confidence interval, $\mu$ is either inside $\left[ \hat{\mu} \pm \frac{\hat{\sigma}}{\sqrt{n}} t_{n-1, 1 - \frac{\alpha}{2}} \right]$ or outside. The proper way to understand confidence intervals is to remember that $IC(1-\alpha)$ covers $\mu$ with probability 1-$\alpha$. Stated differently, on average a fraction $1 - \alpha$ of the confidence intervals you compute will contain the real value $\mu$.

Let's illustrate that with 100 estimates $\hat{\mu}$ of $\mu$ and the corresponding confidence interval of level 0.95. Each estimate $\hat{\mu}$ is a point, each confidence interval is a linerange, the true value of $\mu$ is indicated by the red line and the confidence intervals are colored in black if they cover the true value and in red if they don't. In this (perfect) example, 5% of the 95% CI do not cover $\mu$.

set.seed(44)
plot_ci(100, 500, 0.05)

plot_ci() (used for the previous graph) allows you to compute and plot n_estimates intervals of confidence level 1 - alpha from samples of size sample_size ($n$). Use it to explore the relation between $n$, $\alpha$, the length of $CI(1-\alpha)$ and the probability that it covers $\mu$.

plot_ci(n_estimates = 200, sample_size = 500, alpha = 0.05)

question("The confidence interval is always centered around $\\hat{\\mu}$.", 
         answer("true"), 
         answer("false", correct = TRUE), 
         post_message = "As seen earlier, you can build asymetric CI if you don't choose $\\beta == \\alpha/2$. Furthermore, even if doing so, the resulting interval may not be centered around $\\hat{\\mu}$ (remember the $\\chi^2$ distribution.")

question("In this example, the optimal confidence interval is centered around $\\hat{\\mu}$.", 
         answer("true", correct = TRUE), 
         answer("false"), 
         post_message = "Indeed, the optimal (here narrowest) confidence interval is achieved when choosing $\\beta == \\alpha/2$ and results in symmetric intervals.")

question("If I build 100 $CI(0.95)$ from 100 different random samples, only 5 of them do not cover $\\mu$", 
         answer("true", correct = FALSE), 
         answer("false", correct = TRUE), 
         post_message = "On average, 5% of the 95% confidence interval do not cover $\\mu$ but it's only true on average, not for all sets of confidence intervals.")

mahendra-mariadassou/learningr documentation built on March 26, 2023, 2:26 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

mahendra-mariadassou/learningr
Introduction to R for FdV bachelor students

In mahendra-mariadassou/learningr: Introduction to R for FdV bachelor students

Welcome

Estimator

Computing an estimate

Looking at the estimates

Creating many estimates

Distibution of the estimates

Characterizing the distribution

Understanding the distribution

Bias and Variance

Convergence of $\bar{Y}$

Confidence interval

Known variance

Unknown variance

Understanding confidence intervals (I)

Understanding confidence intervals (II)

R Package Documentation

Browse R Packages

We want your feedback!

mahendra-mariadassou/learningr Introduction to R for FdV bachelor students

In mahendra-mariadassou/learningr: Introduction to R for FdV bachelor students

Welcome

Estimator

Computing an estimate

Looking at the estimates

Creating many estimates

Distibution of the estimates

Characterizing the distribution

Understanding the distribution

Bias and Variance

Convergence of $\bar{Y}$

Confidence interval

Known variance

Unknown variance

Understanding confidence intervals (I)

Understanding confidence intervals (II)

R Package Documentation

Browse R Packages

We want your feedback!

mahendra-mariadassou/learningr
Introduction to R for FdV bachelor students