This package is primarily to provide data that is more similar to what
many people would typically come across in the wild, or is simply more
interesting or accessible (in my opinion), and more useful for
instruction and workshops. Far too often examples use iris
, mtcars
,
etc. for convenience, but these are actually inconvenient for
demonstrating common data and modeling problems, or are too small to
even be realistic.
This package will provide larger and messier data. The bias is towards data that could be understood regardless of discipline/background. In addition, it should have minimally several hundred observations, and often much larger, but not so large that analysis or data processing demonstration would take an inordinate amount of time. However, it should have relatively few columns (unless for demonstration of a ‘large p’ type of problem/analysis, e.g. penalized regression.).
In general the goals are:
In most cases the data has been cleaned up to make it easier to use and understand.
Right now it has:
gapminder_2019
: a 2019 pull from
gapminder.org/data.star_wars
: several data sets based on the Star Wars
API.instructor_evaluations
: a nice-sized data set for
mixed/multi-level modeling taken from the lme4
package.fish
: Number of fish caught on camping trips.pisa
: OECD’s Programme for International Student Assessment with
international scores for math, science, and reading, covering years
2000-2015.world_happiness
: Multiyear data set with country level scores of
‘happiness’. From 2019 World Happiness Report, and includes data
from 2005-2018.sp500
: Daily S & P 500 data for a 10 year period covering +- 5
years before and after the Great Recession low.wine_reviews
, wine_quality
: Two data sets regarding wine reviews
that can be used for a wide range of standard statistical and
machine learning.google_apps
: Ratings and other information for Google Play Store
apps.fashion_train
, fasion_test
: The ‘Fashion MNIST’. Image data for
clothing items.gender_gap
, gender_gap_2018
: Country level data regarding the
World Bank Gender Gap Index.kiva
: Lending information from kiva.org online crowdfunding
platform.water_risk
, water_risk_province
: Country and province level data
regarding water risk.big_five
: Big Five personality traits.heart_disease
: The UCI heart disease data.retirement
: Data on retirement plan participation rate of
employees.movielens
: 1 million samples from MovieLens data.This package is not on CRAN. To install:
devtools::install_github('m-clark/noiris')
To do:
Note to self, see flexmix, poLCA, and other packages. Maybe add classic biochemists for another count data set. Article pub for link models and related.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.