foodOfTheWorld | R Documentation |
foodOfTheWorld
: 26 world cuisines are described by
their cooking ingredients. The long verssion of the data comprises
159 ingredients.
A shorter version includes the 68 most relevant
ingredients.
data("foodOfTheWorld")
A list containing 9 objects:
CT
: a 26 cuisines by 159 ingredients data frame
containing the contingency table;
logCT
: a 26 cuisines by 159 ingredients
data frame containing the "logged" contingency table;
logCT.integer
: a 26 cuisines by 159 ingredients
data frame containing the rounded"logged" contingency table;
small.CT
: a 26 cuisines by 68 ingredients data frame
containing the "small" contingency table;
small.logCT
: a 26 cuisines by 68 ingredients data frame
containing the "small" "logged" contingency table;
small.logCT.integer
:
a 26 cuisines by 68 ingredients data frame
containing the "small" rounded "logged" contingency table;
compact.CT
: a 26 cuisines by 22 ingredients data frame
containing the "compact" contingency table;
compact.logCT
: a 26 cuisines by 22 ingredients data frame
containing the "compact" "logged" contingency table; and
compact.logCT.integer
:
a 26 cuisines by 22 ingredients data frame
containing the "compact" rounded "logged" contingency table;
@keywords datasets data4PCCAR
@author Herve Abdi & Jyothi
These data sets are derived from the work of blogger
Jyothi
(see https://www.r-bloggers.com/a-visualization-of-world-cuisines/
).
Here the main data
consist in a list with three main data frames:
CT
: a contingency table listing
the number of recipes (from the publicly available
data base Epicurious
) that mention a given ingredient
for a given cuisine;
small.CT
A short version is also provided that is
using only 68
(most discriminant) ingredients;
and 3) small.CT
a compact version
(that was given in the original post) with only 22
(most informative) ingredients.
Because the counts vary greatly,
the contingency tables are also re-scaled by taking the
logarithm of the contingency table
(with log(0)
set to 0
).
The log tables are also available as a pseudo-integer version
(obtained by multiplying by 10 and rounding the "logged"
contingency table). The log rescaled tables are preceded
by the prefix log
, the "log-rounded" tables
are, in addition, followed by the suffix .integer
The original post used these data (the compact version) to illustrate principal component analysis, but because these data are counts correspondence analysis is more appropriate.
The large, small, and compact data sets give roughly the same first dimension(s), whereas versions with more descriptors give more significant/relevant dimensions than data sets with fewer variables.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.