foodOfTheWorld: 26 world cuisines are described by their cooking ingredients.

foodOfTheWorldR Documentation

26 world cuisines are described by their cooking ingredients.

Description

foodOfTheWorld: 26 world cuisines are described by their cooking ingredients. The long verssion of the data comprises 159 ingredients. A shorter version includes the 68 most relevant ingredients.

Usage

data("foodOfTheWorld")

Format

A list containing 9 objects:

  1. CT: a 26 cuisines by 159 ingredients data frame containing the contingency table;

  2. logCT: a 26 cuisines by 159 ingredients data frame containing the "logged" contingency table;

  3. logCT.integer: a 26 cuisines by 159 ingredients data frame containing the rounded"logged" contingency table;

  4. small.CT: a 26 cuisines by 68 ingredients data frame containing the "small" contingency table;

  5. small.logCT: a 26 cuisines by 68 ingredients data frame containing the "small" "logged" contingency table;

  6. small.logCT.integer: a 26 cuisines by 68 ingredients data frame containing the "small" rounded "logged" contingency table;

  7. compact.CT: a 26 cuisines by 22 ingredients data frame containing the "compact" contingency table;

  8. compact.logCT: a 26 cuisines by 22 ingredients data frame containing the "compact" "logged" contingency table; and

  9. compact.logCT.integer: a 26 cuisines by 22 ingredients data frame containing the "compact" rounded "logged" contingency table; @keywords datasets data4PCCAR @author Herve Abdi & Jyothi

Details

These data sets are derived from the work of blogger Jyothi (see https://www.r-bloggers.com/a-visualization-of-world-cuisines/). Here the main data consist in a list with three main data frames:

  1. CT: a contingency table listing the number of recipes (from the publicly available data base Epicurious) that mention a given ingredient for a given cuisine;

  2. small.CT A short version is also provided that is using only 68 (most discriminant) ingredients; and 3) small.CT a compact version (that was given in the original post) with only 22 (most informative) ingredients. Because the counts vary greatly, the contingency tables are also re-scaled by taking the logarithm of the contingency table (with log(0) set to 0). The log tables are also available as a pseudo-integer version (obtained by multiplying by 10 and rounding the "logged" contingency table). The log rescaled tables are preceded by the prefix log, the "log-rounded" tables are, in addition, followed by the suffix .integer

The original post used these data (the compact version) to illustrate principal component analysis, but because these data are counts correspondence analysis is more appropriate.

The large, small, and compact data sets give roughly the same first dimension(s), whereas versions with more descriptors give more significant/relevant dimensions than data sets with fewer variables.


HerveAbdi/data4PCCAR documentation built on Sept. 11, 2022, 4:19 p.m.