Shiny dashboard "Statistical foundations of machine learning"

Estimation

Common left panel:

point estimation (1D Gaussian)

Goal: visualize the bias/variance of the sample average and sample variance estimators.

Data generating process: normal random variable ${\bf z} \sim {\mathcal N} (\mu,\sigma^2)$. Estimation of mean and variance.

Suggested manipulations:

point estimation (1D Uniform)

Goal: visualize the bias/variance of the sample average and sample variance estimators. Show that the unbiasedness of those estimators is independent of the distribution.

Data generating process: uniform random variable ${\bf z} \sim {\mathcal U} (a,b)$. Estimation of mean and variance.

Suggested manipulations:

confidence interval (1D Gaussian)

Goal: visualize the notion of confidence interval

Data generating process: normal random variable ${\bf z} \sim {\mathcal N} (\mu,\sigma^2)$. Estimation of confidence interval of the mean.

Suggested manipulations:

Likelihood (1 par)

Goal: visualize the relation between accuracy of the estimation and log-likelihood

Data generating process: normal random variable ${\bf z} \sim {\mathcal N} (\mu,\sigma^2)$. Maximum likelihood estimation of the mean (known $\sigma^2$). We denote $\hat{\mu}_{ml}$ the m.l. estimator.

Suggested manipulations:

Likelihood (2 pars)

Goal: visualize the relation between accuracy of the estimation and log-likelihood

Data generating process: normal random variable ${\bf z} \sim {\mathcal N} (\mu,\sigma^2)$. Maximum likelihood estimation of both mean and variance. We denote $\theta=[\mu,\sigma^2]$ the parameter vector and $\hat{\theta}_{ml}$ the m.l. estimator.

Suggested manipulations:

point estimation (2D Gaussian)

Goal: visualization of the multivariate variance of the sampling distribution of a multivariate estimator

Data generating process: Normal random 2D vector ${\bf z} \sim {\mathcal N} (\mu,\Sigma^2)$ where $\mu=[0,0]^T$ is a [2,1] vector and the covariance $\Sigma^2$ is a [2,2] matrix.

Note that the diagonal matrix corresponds to $\lambda_1=\lambda_2=1$ and $\theta=0$.

Suggested manipulations:

Linear Regression

Goal: visualization of the sampling distribution (bias/variance) of least-squares estimators vs. real parameters to illustrate the unbiasedness of least-squares.

Generating data process: conditional distribution given by ${\bf y}=\beta_0+\beta_1 x+{\bf w}$ where ${\bf w} \sim {\mathcal N} (0,\sigma^2)$ and ${\bf x}$ is uniformly distributed.

Top left sliders:

Top middle: 3D visualization of the joint density $p({\bf x}=x,{\bf y}=y)$ Top right: sampling distribution of fitting lines, function $f$ and value

Bottom left: sampling distribution of the estimator of the conditional expectation $E[{\bf y}|x]$ (in green) Bottom right: sampling distribution of the estimators of

  1. $\beta_0$
  2. $\beta_1$
  3. $\sigma^2_{w}$

Suggested manipulations:

Nonlinear Regression

Goal: visualization of the sampling distribution (bias/variance) of predicted vs. real conditional expectation.

Generating data process: ${\bf y}=f(x)+{\bf w}$ where ${\bf w} \sim {\mathcal N} (0,\sigma_w^2)$ Estimator: $h(x)=\hat{\beta}0 +\sum{i=1}^m \hat{\beta_i}_i x^i$. Parameters are estimated by least-sqaures.

Top left sliders:

Top right: 3D visualization of the joint density $p({\bf x}=x,{\bf y}=y)$

Bottom left: sampling distribution of the estimator of the conditional expectation $E[{\bf y}|x]$ (in green) for different $x$ values. Mean of the estimated conditional expectation is in blue.

Bottom right: sampling distribution of estimator of the conditional expectation $E[{\bf y}|x]$ for given $x$

Suggested manipulations:



gbonte/gbcode documentation built on Feb. 27, 2024, 7:38 a.m.