knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library("exams2moodle")

Data generating functions

Statistical functions

Univariate descriptive statistics

Bivariate descriptive statistics

Simple linear regression

Classical univariate times series

Probability theory

Combinatorics

Unconditional probabilities

prob_

Conditional probabilities

prob_

Random variables

rv_

Distributions

Tests

test_

Confindence intervals

ci_

Introduction

If you are generate exercise for a student you usually encounter two problems:

  1. Not all data sets are suitable since the students may round intermediate results which may lead to quite different, but true, solutions.
  2. You do not have access to the intermediate values which might be necessary to explain the solution.

The second problem just requires the extension of the existing routines.

You can encounter problem 1 only if the students are not allowed to use computers to solve the exercises. Personally I believe that students only understand a statistical coefficient or graphic if they are able to calculate the required value(s) by hand; it is like learning the times tables by heart with pupils. Otherwise most of them will be just random button pushers in the future.

To solve the problems I decided to generate appropriate (full) data sets such that I have control over it. The general approach is

library("exams")
library("exams2moodle")
repeat {
  ... # some data generation
  if (condition_holds) break
}

For example for the compution of the median from five observations $x_i$ we know that the third sorted observation $x_{(3)}$) is the solution. Of course, we have to avoid that by chance the third sorted observation is equal to the third observation, otherwise a student may miss an important step in computing the median.

library("exams")
library("exams2moodle")
repeat {
  x  <- sample(1:10, size=5)
  sx <- sort(x)
  if (x[3]!=sx[3]) break
}
x

Level or measurement - to_choice

For determining the correct level of measurement of a variable we use an Excel file with two columns with the name of the variable and the level of measurement.

# subset of variables we use, variable names are in German  
data("skalenniveau")
skalen <- c("nominal", "ordinal", "metrisch")
stopifnot(all(skalenniveau$type %in% skalen))  # protect against typos
skala  <- sample(skalenniveau$type, 1)
exvars <- sample(nrow(skalenniveau), 8)
tf     <- (skalenniveau$type[exvars]==skala)
sc     <- to_choice(skalenniveau$name[exvars], tf)
# Additional answer: Does none fit?
sc$questions <- c(sc$questions, "Keine der Variablen hat das gewünschte Skalenniveau") 
sc$solutions <- c(sc$solutions, !any(tf))                                              
sc

The to_choice function generates a object such that can be used in answerlist and mchoice2string . The first parameter is either a vector or data frame. The second parameter is a logical vector containing TRUE if the element in the vector (or row in the data frame) contains a true answer.

The parameter shuffle samples from the correct and false answers. The following example could replace the main code from the example above.

# subset of variables we use, variable names are in German  
data("skalenniveau")  
skalen <- c("nominal", "ordinal", "metrisch")
skala  <- sample(skalenniveau$type, 1)
exvars <- sample(nrow(skalenniveau), 8)
tf     <- (skalenniveau$type[exvars]==skala)
# select one true and four false answers
sc     <- to_choice(skalenniveau$name[exvars], tf, shuffle=c(1,4))
sc

By default the answers are ordered, the parameter order which function is used to order the answers (default: order). To use the ordering given set order=NULL.

Descriptive statistics: mean - means_choice, to_choice

means_choice computes a list of mean values for a given data vector x:

If the parameter trim and/or winsor set to NA then these means will not be computed.

digits <- 2  # round to two digits
repeat {
  x  <- round(runif(7, min=165, max=195), digits)
  ms <- means_choice(x, digits)         
  if (attr(ms, "mindiff")>0.1) break   # make sure that all values are different by 0.1
}
ms <- unlist(ms)
sc <- to_choice(ms, names(ms)=='mean') # arithmetic mean is the correct solution
str(sc)

The attribute mindiff gives the minimal distance between two mean values. This might be important for setting extol the tolerance for numeric solutions.

Descriptive statistics: arithmetic mean - num_result

For computing the arithmetic mean we just need to ensure that data are rounded to a certain level since the computation with pocket calculator is simple.

library("exams2moodle")
n    <- sample(4:8, size=1)
x    <- round(runif(n, min=1.65, max=1.85), 2)
sc   <- num_result (c(mean(x), x), digits = 2, tolmult = 2)

The function num_result creates a list with the elements

The tolerance is computed as tolmult*10^(-digits).

Statistical inference: Continuous random variables

General assistance functions

Rounding

fractions, is_terminal

To overcome the rounding problem there is a simple approach: try to use (terminal) fractions. A terminal fraction generates a number with a finite number of digits, for example $\frac{1}{10}=0.1$.

The command fractions calls simply MASS::fractions() to avoid that you have explictly to load the library MASS. The result of calling fractions has an attribute fracswhich contains a (approximate) fraction as $\frac{numerator}{denominator}$ representation.

x <- c(1/5, 1/6)
x
fractions(x)
str(fractions(x))

Therefore is_terminal tests if all entries are terminal fractions which means the denominators must be dividable by two and five only.

x <- c(1/5, 1/6)
is_terminal(x)

Unfortunately we use a decimal numeral system limiting the number possible of denominators which lead to terminal numbers; the ancient babylonian cultures using a sexagesimal numeral system had a larger number of denominators which would lead to terminal numbers.

Solution - num_result, int_result

Tables - html_matrix_sk

Since creating tables for Moodle is still a pain, we used two approaches

  1. replace the table by an enumeration (see as_string, as_obs), or
  2. create a specific table for HTML (Moodle) and LaTeX (PDF).

decided to replace a table with the $x$ values with a enumeration in LaTeX

Math formulas as LaTeX

as_string

x <- round(runif(5), 2)
as_string(x)
as_string(x, collapse="+", last="+")

as_obs

x <- round(runif(5), 2)
as_obs(x)
as_obs(x, "y")
as_obs(sort(x), "y", sorted=TRUE)

lsumprod

x <- round(2*runif(5)-1, 2)
y <- round(2*runif(5)-1, 2)
lsumprod(x,y)

Multiple stories

To make it more difficult for students to recognize similar tasks we change the story for a task. Fortunately the internet provides us with a lot of collections with statistics exercises and the stories ;)

The file are stored in ./subdir/exercisename_num.Rmd, ./subdir/exercisename_1.Rmd, ./subdir/exercisename_2.Rmd, and so on. The framework in ./subdir/exercisename_num.Rmd is shown below. Note that all story files have to create the same variables for the solution part.

```{r, echo = FALSE, results = "hide"}
library("exams")
library("exams2moodle")
ed      <- exams:::.exams_get_internal("xexams_dir_exercises")
stories <- list.files(pattern="exercisename\\_[0-9]*.Rmd", path=paste0(ed, "/some_subdir"), full.names = TRUE)
story   <- sample(stories, size=1)
```

Question
========

```{r, child=story} 
```

Some additional common question text, e.g. for rounding the result(s).

Solution
=========

Some solution text which is basically the same for all exercisename_X.Rmd's.

Meta-information
================
extype: ...
exsolution: ...
extol:  ...
exname: `r if(exists("story")) story else knitr::current_input()`

Note: if you copy the source before then you will find a zero-width space after the first backticks which will be visible in RStudio and needs to be deleted.

Task identification - now

If we randomize the task and the stories then we may have a lot of different tasks. If questions arise then we need to identify the exact task a student has.

Therefore we embed a

substring(now(), 10)

The now function uses gsub('.', '', sprintf("%.20f", as.numeric(Sys.time())), fixed=TRUE) and ensures that every time called a different number is returned.

Data generation

add_data

add_data adds data point(s) to the left and/or the right of a given data vector x.

  1. A box and its width is determined by
    • box="range" gives a box width of width=max(x)-min(x) and two points xleft=min(x) and xright=max(x)
    • box="box" gives a box width of width=IQR(x) and two points xleft=quantile(x, 0.25) and xright=quantile(x, 0.75)
    • box=c(xleft, xright) gives a box width of width=xright-xleft and two points xleft and xright
  2. The numbers of additional data points is determined by n
    • n=c(nleft, nright) gives the number of points to generate at the left and right
    • n=1 is a short form of c(0,1) (the default)
  3. Within the interval [xleft-range[2]*width; xleft-range[1]*width] are nleft points uniformly drawn and within the interval [xright+range[1]*width; xright+rang[2]*width] are nleft points uniformly drawn (both intervals are colored in red)
par(mar=c(0,0,0,0))
plot(c(0, 1), c(0.15,1.15), axes=FALSE, type="n", xlab="", ylab="")
rect(0.25, 0.25, 0.75, 0.75)
text(0.25, 0.25, labels="xleft", pos=1)
text(0.75, 0.25, labels="xright", pos=1)
text(0.5, 0.75, labels="width", pos=3)
arrows(0.25, 0.8, 0.75, 0.8, code=3, length=0.1)
arrows(0.0, 0.5, 0.2, 0.5, code=3, col="red", length=0.1)
arrows(0.8, 0.5, 1.0, 0.5, code=3, col="red", length=0.1)
text(0.8, 0.8, "xright+range[1]*width", col="red", srt=90)
text(1, 0.8, "xright+range[2]*width", col="red", srt=90)
text(0, 0.8, "xleft-range[2]*width", col="red", srt=90)
text(0.2, 0.8, "xleft-range[1]*width", col="red", srt=90)
x <- runif(7, 165, 195)
xr <- add_data(x, "range", n=c(0,1), range=c(1,1.5)) 
round(xr)
xb <- add_data(x, "box", n=c(0,1), range=c(1,1.5)) 
round(xb)
x1 <- add_data(x, box=c(165,195), n=c(0,1), range=c(1,1.5)) 
round(x1)

scale_to

Given a numeric vector it uses a linear transformation to rescale the data to a given mean and standard deviation. The defult is to standardize the data

x <- runif(30)
mean(x)
sd(x)
#
y <- scale_to(x, mean=2, sd=0.5)
mean(y)
sd(y)

ddiscrete

ddiscrete generates a finite one-dimensional discrete probability distribution. If the length of x is one then x is the number of elements. Otherwise x is considered as a starting distribution and length of x is the number of elements.

The parameter zero determines if the final distribution can contain the probability entry zero or not. Since for computation of exercises based on a one-dimensional discrete probability distribution it favorably that the entries are fractions have the same denominator the parameter unit can be used for this. Thus, if the smallest non-zero denominator should be 1/7 then use unit=7; the default is a power of 10.

ddiscrete(6) # fair dice
x <- runif(6)
ddiscrete(x)
ddiscrete(x, zero=TRUE)
ddiscrete(x, unit=15)
fractions(ddiscrete(x, unit=15))

ddiscrete2

ddiscrete2 generates a finite two-dimensional discrete probability distribution. The generation needs two steps:

  1. Generate two marginal finite two-dimensional discrete probability distributions. Based on this a joint probability for two independent distributions is generated.
  2. Define target measure for association and target value for the association for the joint distribution.

The current available association measure are:

r <- ddiscrete(6)
c <- ddiscrete(6)
ddiscrete2(r, c)
ddiscrete2(r, c, FUN=nom.cc, target=0.4)
ddiscrete2(r, c, FUN=nom.cc, target=1)

The units are determined as units of r multiplied with the units of c. Since a iterative process is used the parameter maxitis set to 500. If the attribute iterations is equal to maxit then the iterative process has not been finished. The attribute target gives the asscociation value obtained.

distribution

An object of class distribution holds a distribution (of a random variable). It is specified by a name and the distribution values. The name is used create quantile (paste0("q", name)) and cumulative distribution functions (paste0("p", name)), for example

The names of the above distributions can be abbreviated, for all other the exact name must be given.

d <- distribution("t", df=15)
quantile(d, c(0.025, 0.975))
d <- distribution("norm", mean=0, sd=1)
cdf(d, c(-1.96, +1.96))
d <- distribution("binom", size=9, prob=0.5)
pdf(d, 5)

Approximation check for distributions

The routines codifies rules for the approximation of distributions. If you generate dynamically exercises then you might want to avoid that an approximation is possible. However, only a part of the requirements can be checked, e.g. for the central limit theorem if more than $c$ variables are added.

t2norm

A $t$ distribution can be approximated by a standard normal distribution when the degree of freedom is large enough. The default is $n>30$. The default value of $30$ can be overwritten with options(distribution.t2norm=50) or explicitly set.

t2norm(40)
t2norm(40, c=50) 

binom2norm

A binomial distribution can be approximated by a normal distribution if:

The default value of $c=9$ can be overwritten with options(distribution.binom2norm=5) or explicitly set.

binom2norm(30, 0.5)
binom2norm(50, 0.5)
binom2norm(10, 0.5, type="double")
binom2norm(30, 0.5, type="double")

clt2norm

The central limit theorem allows the approximation of a sum of random variables with a normal distribution. The approximation holds if enough variables are added. The default value of $c=30$ can be overwritten with options(distribution.clt2norm=5) or explicitly set.

clt2norm(50)
clt2norm(50, c=100)

Statistics helper

means_choice

means_choice computes a list of mean values for a given data vector x:

If the parameter trim and/or winsor set to NA then these means will not be computed.

x <- runif(7, min=165, max=195)
str(means_choice(x, 2))

The attribute mindiff gives the minimal distance between two mean values.

mcval

The function computes the mode (most common value) of data. Note that in case if the mode is not unique than all values will be returned.

# numeric
x <- sample(1:5, size=25, replace = TRUE)
table(x)
mcval(x)
# character
x <- sample(letters[1:5], size=25, replace = TRUE)
table(x)
mcval(x)
# histogram
x <- hist(runif(100), plot=FALSE)
mcval(x)
mcval(x, exact=TRUE)

histdata

histdata computes data about the corresponding histogram to a vector like hist, but returns more information which might be necessary for exercises. In contrast to hist histdata requires that breaks covers the entire range of x.

histdata has the additional parameterprobs. If breaks="quantiles" then it determines which quantiles are used.

x  <- runif(25) 
h1 <- hist(x, plot=FALSE)
str(h1)
h2 <- histdata(x)
str(h2)

The returned list contains the following elements:

You can compute mean, quantile, median and mode for a histogram:

x <- runif(25) 
h <- histdata(x)
# mean
mean(h)
# median & quantile
median(h)
quantile(h)
# mode
mcval(h)
mcval(h, exact=TRUE)

histx

Creates a data set for a given set of breaks (class borders) and number of class observations. The data are generated such that the data points are uniformly distributed in each class.

breaks <- seq(1.6, 2.1, by=0.1)
x <- histx (breaks, sample(5:15, length(breaks)-1))
hist(x, breaks)
rug(x)

histwidth

Creates histogram data sampled from a set of class widths with following properties:

hw <- histwidth(1.6, 2.1, widths=0.05*(1:4))
str(hw)
x  <- histx (hw$breaks, hw$n)
hist(x, hw$breaks)
rug(x)

combinatorics, permutation, variation, and combination

Computes all results for permutation, variation and combination with and without reptition.

variation(7,3)         # without replication
variation(7,3, TRUE)   # with replication
combination(7,3)       # without replication
combination(7,3, TRUE) # with replication
permutation(7)
permutation(7, c(2,1,4)) # three groups with indistinguishable elements
combinatorics(7, 3)

The attribute mindiff gives the minimal distance between two values.

Association measures

A set function which compute which compute association measure based on a contingency table:

tab <- matrix(round(10*runif(15)), ncol=5)
nom.cc(tab)
nom.cc(tab, correct=TRUE)
nom.cramer(tab)
ord.spearman(tab)
ord.kendall(tab)

Output helper

affix, math

Adds a prefix and/or suffix to a (character) vector.

x <- runif(5)
affix(x, "$", "$")
math(x)

as_string, as_obs

Converts a (character) vector in a single string. If the vector is not of class character then it will be converted to character.

x <- round(runif(5),2)
as_string(x) 
as_string(x, last=" und ")                # as german
as_string(x, collapse=" + ", last= " + ") # as sum
# observations
as_obs(x)
as_obs(sort(x), name="y", sorted=TRUE)

num_result, num2str

num_result creates a list with a solution x, a (rounded) text representation fx, and a tolerance for the result.

str(num_result(pi, 3))
str(num_result(pi, 6))
str(num_result(pi, 6, tolmult=5))
str(num_result(pi, 6, tolmult=5, tolerance=1e-6))

num2str computes a list of string representation of a numeric vector using as.character.

x <- 1
str(num2str(x))
y <- 2
str(num2str(x, y))
str(num2str(x, y, z=c(x,y)))

as_table

Converts a vector into a horizontal table. The parameters are the same as in xtable which is used internally. Intended for the use as (class) frequency table.

x   <- runif(5)
tab <- vec2mat(x, colnames=1:length(x))
as_table(tab)
tab <- vec2mat(x, colnames=sprintf("%.0f-%0.f", 0:4, 1:5))
as_table(tab)

html_matrix, zebra, tooltip, toHTML

Returns a HTML representation of a matrix as table. If you create exercises for Moodle then you can embed the HTML code in your exercise since it will be translated by exams2moodle to HTML, too.

# Copy and paste program to console
library("tools")
library("magrittr")
library("exams2moodle")
x  <- matrix(1:12, ncol=3)
hm <- html_matrix(x)
toHTML(hm)
hm <- html_matrix(x) %>% zebra() %>% 
        tooltip(sprintf("Table has %.0f rows and %.0f columns", nrow(.), ncol(.)))
toHTML(hm)

With parameters the appearance of the table can be influenced:

General helper

all_different

For solutions in multiple choice exercises you want to ensure that the numerical results are not too near to each other. Therefore, all_different checks if the differences between the entries in obj are larger than some given value tol.

x <- runif(20)
all_different(x, 1)    # minimal distance is at least 1
all_different(x, 1e-4) # minimal distance is at least 0.0001

equal

Compares two numeric values if they are equal given a tolerance (default: 1e-6) by abs(x-y)<tol).

x <- pi
y <- pi+1e-4
equal(x, y)
equal(x, y, tol=1e-3)

firstmatch

firstmatch seeks matches for the elements of its first argument among those of its second. If multiple matches are found then the first match is returned, for further details see charmatch.

firstmatch("d", c("chisq", "cauchy"))
firstmatch("c", c("chisq", "cauchy"))
firstmatch("ca", c("chisq", "cauchy"))

fractions

fractions is a copy of MASS::fractions to compute from a numeric values fractions.

d <- ddiscrete(6)
fractions(d)

hyperloop, unique.elem

For generating answers for multiple choice exercises it is helpful to run the same routine several times with different input parameters. For example students may forget to divide by n-1 or divide by n instead of n. hyperloop runs about all parameter combinations.

ttest_num is a routine which computes all information required for exercises with a $t$-test.

library("exams2moodle")
x <- runif(100)
correct <- ttest_num(x=x, mu0=0.5, sigma=sqrt(1/12))
str(correct)

Now, let us run many $t$-tests (up to 384) with typical student errors. We extract all different test statistic and choose seven wrong answers and one correct answer with the condition that all solutions differ at least by 0.051.

res <- hyperloop(ttest_num, 
                 n           = list(1, correct$n, correct$n+1),
                 mu0         = list(correct$mu0, correct$mean),
                 mean        = list(correct$mu0, correct$mean), 
                 sigma       = list(correct$sigma, correct$sd, sqrt(correct$sigma), sqrt(correct$sd)),
                 sd          = list(correct$sigma, correct$sd, sqrt(correct$sigma), sqrt(correct$sd)),
                 norm        = list(TRUE, FALSE)
                )
# extract all unique test statistics
stat <- unlist(unique_elem(res, "statistic"))
# select 7 wrong test statistic such that the difference 
# between all possible test statistics is at least 0.01
repeat {
  sc <- to_choice(stat, stat==correct$statistic, shuffle=c(1,7))
  if (all_different(sc$questions, 0.005)) break
}
# show possible results for a MC questions
sc$questions
sc$solutions

Recreate exercises in readable form from a JSON file

  1. Go into Moodle to the test, choose 'Options' -> 'Assessment' -> 'Detailed answers'
  2. Download the reports of the students in JSON format. Note: Images will be not downloaded from Moodle, only the text of the alt attribute is shown!
  3. Start R and type
library("exams2moodle")
# replace the next line with
# file <- "location_of_your_downloaded_JSON_file"
file <- system.file("json", "Test-Klausur-Statistik.json", package="exams2moodle")
# this will create in the directory with your JSON file a PDF file
json2report(file)

Named parameters which can be used:

In case that you develop your own templates you may have further parameters which you might want to replace in the template. For example if you use a named parameter subtitle then all {{subtitle}} will be replaced in the template file by the content of the parameter subtitle.

Templates

hu_german.Rmd

Access the template with system.file("templates", "hu_german.Rmd", package="exams2moodle"). Note that the real work is done within the template, not in the json2report function.

cat(paste0(readLines(system.file('template', 'hu_german.Rmd', package='exams2moodle')), collapse="\n"))

question_german.Rmd

Access the template with system.file("templates", "question_german.Rmd", package="exams2moodle").

cat(paste0(readLines(system.file('template', 'question_german.Rmd', package='exams2moodle')), collapse="\n"))


sigbertklinke/exams2moodle documentation built on July 6, 2023, 3:26 p.m.