knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
suppressPackageStartupMessages(library("mmstat4"))
This package was designed with the aim of distributing educational resources for statistics courses targeted at students.
Once the teaching materials have been downloaded, the primary functions of this package include:
With this feature, students can access various educational materials such as interactive apps, R code, data files, and other resources that can be helpful in learning statistical concepts. By providing easy access to these materials, the package aims to facilitate the learning process for students and make it more interactive and engaging.
GitHub allows you to download the repository as a ZIP file. You can find the option to download under the Code button (Download ZIP) in the repository. mmstat4 works with this ZIP file, but you can also use one of your own ZIP files.
In my courses, I assume that all R programs run in a freshly started R environment, meaning there are no path dependencies, and all necessary libraries are loaded within the R program. My repositories contain not only the example programs for the students but also the programs I use to create images and tables, as well as the Shiny Apps I demonstrate.
You can install mmstat4
from CRAN using:
install.packages("mmstat4")
Alternatively, you can install the development version from GitHub using devtools
:
devtools::install_github("sigbertklinke/mmstat")
A component of the package includes a small ZIP file containing educational materials. Initially, we need to instruct mmstat4
to utilize this ZIP file instead of the larger ZIP file for my Data Analysis I and II courses.
ghget('local') ghopen("example_mcnemar.R") # open a R example file
ghget
returns the key (local
) associated with the currently active ZIP file. ghopen
launches the example file in RStudio. To access the equivalent Python script, use:
ghopen("example_mcnemar.py") # open a Python example file
Note: To run Python scripts, ensure local Python installation. Scripts execute within mmstat4.xxxx
virtual environment, created upon script run or open. User approval is crucial. Upon setup, script checks for init_py.R
in ZIP file. If found, it's executed, often installing Python modules with reticulate::py_install('module name')
.
Shiny apps can also be launched in RStudio and run locally.
ghopen("pca_best_line/app.R") # open a Shiny app
Data files can be loaded with:
x <- ghload("TelefonDaten.csv") # load a data set head(x)
x <- ghget("local") x <- ghload("TelefonDaten.csv") # load a data set head(x)
HTML and PDF files will open in the default application:
ghopen("Formelsammlung.pdf") # open a PDF file
ghget
A ZIP file or repository can be stored locally or in the internet.
A key-value approach can be used to determine the location of the source ZIP file. If no key is defined then ghget
uses the base name of the source ZIP file as the key.
ghget(dummy="https://github.com/sigbertklinke/mmstat4.dummy/archive/refs/heads/main.zip")
Three repository keys are predefined: hu.data
, hu.stat
and dummy
. You can retrieve them via
ghget('dummy') ghget('hu.stat') ghget('hu.data')
If you do not use a key, the programme will create one and return it as result.
ghget(system.file("zip/mmstat4.dummy.zip", package = "mmstat4")) ghget("https://github.com/sigbertklinke/mmstat4.dummy/archive/refs/heads/main.zip") # tries https://github.com/my/github_repo/archive/refs/heads/[main|master].zip ghget("my/github_repo") # will fail # ghget() # uses 'hu.data'
ghget
downloads the ZIP file, saves it to a temporary location and unpacks it. For non-temporary locations, see the FAQ.
In addition, unique short names, related to the ZIP file content, are generated from the path components.
After unpacking the ZIP file, unique short names are generated for these files.
ghget('dummy') gd <- ghdecompose(ghlist(full.names=TRUE)) head(gd)
The file name is split into four parts. The last two parts, minpath
and filename
, are used to create short names:
/tmp/RtmpXXXXXX/mmstat4.dummy-main/LICENSE
is LICENSE
. There was no other file named LICENSE
in the ZIP file. Therefore, it is sufficient to address this file in the ZIP file. /tmp/RtmpXXXXXX/mmstat4.dummy-main/data/BANK2.sav
is data/BANK2.sav
. There is another file called BANK2.sav
in the ZIP file, but to address it uniquely, data/BANK2.sav
is sufficient for this file in the ZIP file (the other is dbscan/BANK2.sav
). Currently, no check is made whether two files with identical basenames are also identical in content.ghlist("BANK2", full.names=TRUE) # full names ghlist("BANK2") # short names
ghopen
, ghload
, ghsource
The short names (or the full names) can be used to work with the files
x <- ghload("data/BANK2.sav") # load data via rio::import ghopen("univariate/example_ecdf.R") # open file in RStudio editor ghsource("univariate/example_ecdf.R") # execute file via source ghlist("example_ecdf") # "univariate/" was unnecessary
ghlist
, ghquery
With ghlist
you can get a list of unique (short) names for all files or a subset based on a regular expression pattern
in the repository
str(ghlist()) # get all short names ghlist("\\.pdf$") # get all short names of PDF files
With ghquery
you can query the list of unique (short) names for all files based on the overlap distance.
ghlist("bnk") # pattern = "bnk ghquery("bnk") # nearest string matching to "bnk"
ghfile
, ghpath
, ghdecompose
ghfile
tries to find a unique match for a given file and returns the full path. If there is no unique match, an error is returned with some possible matches.
ghdecompose
builds a data frame and decomposes the full names of the files into
outpath
the path part which is the same for all files (basically the place where the ZIP file is extraced to),inpath
the path part that is not used in minpath
, but in the ZIP file,minpath
the minimum path part, so that all files are uniquely addressable,filename
the base name of the file, andsource
the input for shortpath.The short names for the files are built from the components minpath
and filename
.
ghpath
builds up the short name with various path components from a ghdecompose
object.
ghfile('data/BANK2.sav') ghget(local=system.file("zip", "mmstat4.dummy.zip", package="mmstat4")) fnf <- ghlist(full.names=TRUE) dfn <- ghdecompose(fnf) head(dfn) head(ghpath(dfn))
The package comes with two RStudio addins (see under Addins -> MMSTAT4
):
Open a file from a zip file (ghopenAddin
), which gives access to the unzipped zip file and opens the selected file in an RStudio editor window.
Execute a Shiny app from a zip file (ghappAddin
), which extracts all directories containing Shiny apps and opens the selected app in a web browser (using the default browser).
Currently there are the following routines to support R/Python code snippets:
pkglist
or modlist
, which extracts all library
/require
/import
calls from code snippets and returns a frequency table of the packages or and modules called.ghget(local=system.file("zip", "mmstat4.dummy.zip", package="mmstat4")) files <- ghlist(pattern="*.R$", full.names = TRUE) cat(head(pkglist(files, repos="https://cloud.r-project.org"), 12))
Note that the line for CHAID
is commented out. The package cannot be found in CRAN, but you can install it from R-Forge.
cat(head(pkglist(files, repos=c("https://cloud.r-project.org", "http://R-Forge.R-project.org")), 12))
You can add a file init_R.R
or init_py.R
to your ZIP file, which installs the necessary R packages or Python modules.
checkFiles
checks whether each R code snippet runs smoothly in a freshly started R.
# just check the last files from the list # Note that the R console will show more output (warnings etc.) checkFile(files, start=435) # alternatively: Rsolo
Three modes are available for checking a file
:
exist
: Does the source file exist?parse
: Is parse(file)
or python -m "file"
successful? (default)run
: Is Rscript "file"
or python3 "file"
successful?dupFiles
uses checksums to check whether files exist twice.
files <- ghlist(full.names = TRUE) head(dupFiles(files)) # alternatively: Rdups
Note: there is also an error message if the necessary libraries are not installed!
Once you created your ZIP file you need to know under which names a specific file can be
accessed. In the example we use a ZIP file which comes with the package mmstat4
:
ghget(local=system.file("zip", "mmstat4.dummy.zip", package="mmstat4")) ghnames <- ghdecompose(ghlist(full.names=TRUE)) ghnames[58,]
The shortest possible name is determined by minpath
and filename
.
But all other paths determined by uniquepath
, minpath
and filename
should also work.
For file number 58, the following access names are possible:
BANK2.sav
will not work since more than one file named BANK2.sav
in the ZIP file.dbscan/BANK2.sav
will work since this the shortest possible name.cluster/dbscan/BANK2.sav
, data/cluster/dbscan/BANK2.sav
, and examples/data/cluster/dbscan/BANK2.sav
will work.x1 <- ghload("BANK2.sav") x2 <- ghload("dbscan/BANK2.sav") x3 <- ghload("cluster/dbscan/BANK2.sav") x4 <- ghload("data/cluster/dbscan/BANK2.sav") x5 <- ghload("examples/data/cluster/dbscan/BANK2.sav")
Please email me at sigbert@hu-berlin.de
. You can also try the current development version of the package from GitHub:
# install.packages("devtools") devtools::install_github("sigbertklinke/mmstat4")
No, this is not supported.
ghget("dummy", .force=TRUE)
ghget("dummy", .tempdir=FALSE) # install non-temporarily ghget("dummy", .tempdir="~/mmstat4") # install non-temporarily to ~/mmstat4 ghget("dummy", .tempdir=TRUE) # install again temporarily
Note: If a repository was installed permanently and you switch back to temporarily storage then the downloaded files will not be deleted.
ghget("dummy", .tempdir=TRUE) ghlist(pattern="/(app|server)\\.R$") ghopen("dbscan") # open the app
csv
data files?ghget("dummy", .tempdir=TRUE) ghlist(pattern="\\.csv$", ignore.case=TRUE, full.names=TRUE) # use mmstat4::ghload for importing ghlist(pattern="\\.csv$") pechstein <- ghload("pechstein.csv") str(pechstein)
For Ubuntu (Linux) install:
sudo apt-get install python3 python3-dev python3-pip python3-venv libbz2-dev
Note: mmstat4
installs these Python modules numpy
, scipy
, statsmodels
, pandas
, scikit-learn
, matplotlib
, and seaborn
by default.
init_py.R
is only called if the virtual environment is created. Can I force a new call?Yes, delete the virtual environment and recreate it
reticulate::virtualenv_remove('mmstat4') ghinstall('py', force=TRUE)
The package recognises three standard repositories: dummy
, hu.stat
, and hu.data
.
tabl <- " | Repository | Size | ZIP file location | | :--------------- | :-------------| :--------| | `dummy` | 3 MB | `https://github.com/sigbertklinke/mmstat4.dummy/archive/refs/heads/main.zip` | | `hu.data` | 29 MB | `https://github.com/sigbertklinke/mmstat4.data/archive/refs/heads/main.zip` | | `hu.stat` | 31 MB | `https://github.com/sigbertklinke/mmstat4.stat/archive/refs/heads/main.zip` | " cat(tabl)
dummy
is small subsample of hu.stat
and hu.data
which is intended for examples and test purposes.
Mathematische Grundlagen - Einführung - Grundbegriffe - Univariate Verteilungen - Parameter univariater Verteilungen - Bivariate Verteilungen - Parameter bivariater Verteilungen - Regressionanalyse - Zeitreihenanalyse - Indexzahlen - Wahrscheinlichkeitsrechnung - Zufallsvariablen - So lügt man mit Statistik - Wichtige Verteilungsmodelle - Stichprobentheorie - Statistische Schätzverfahren - Regressionsmodell - Konfidenzintervalle - Statistische Testverfahren - Parameterische Tests - Nichtparametrische Tests
ghget("hu.stat") ghopen("Statistik.pdf") ghopen("Aufgaben.pdf") ghopen("Loesungen.pdf") ghopen("Formelsammlung.pdf")
General - R - Basics and data generation - Test and estimation theory - Parameter of distributions - Distribution - Transformations - Robust statistics - Missing values - Subgroup analysis - Correlation and association - Multivariate graphics - Principal component analysis - Exploratory factor analysis - Reliability - Cluster analysis - Regression analysis - Linear regression - Nonparametric regression - Classification and regression trees - Neural networks
ghget("hu.data") ghopen("dataanalysis.pdf")
Einführung - Entdeckung und Identifikation von Ausreißern - Prüfung der Verteilungsform von Variablen - Parametervergleiche bei unbhängigen Stichproben - Anhänge A-D, Literaturverzeichnis, Index
ghget("hu.data") ghopen("cs1_roenz.pdf")
Vorwort - Überprüfung von Zusammenhängen - Regressionsanalyse - Reliabilitäts- und Homogenitätsanalyse von Konstrukten - Anhänge A-H, Literaturverzeichnis, Stichwortverzeichnis
ghget("hu.data") ghopen("cs2_roenz.pdf")
Einführung - Verallgemeinerte lineare Modelle (generalized linear models, GLM) - Modellierung binärer Daten - Das multinomiale Logit Modell - Modellierung multinomialer Daten (log-lineare Modelle) - Literaturverzeichnis, Index
ghget("hu.data") ghopen("glm_roenz.pdf")
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.