| readingSkills | R Documentation |
A toy data set illustrating the spurious correlation between reading skills and shoe size in school-children.
data("readingSkills")
A data frame with 200 observations on the following 4 variables.
nativeSpeakera factor with levels no and yes,
where yes indicates that the child
is a native speaker of the language of the reading test.
ageage of the child in years.
shoeSizeshoe size of the child in cm.
scoreraw score on the reading test.
In this artificial data set, that was generated by means of a linear model,
age and nativeSpeaker are actual predictors of the
score, while the spurious correlation between score and
shoeSize is merely caused by the fact that both depend on age.
The true predictors can be identified, e.g., by means of partial correlations, standardized beta coefficients in linear models or the conditional random forest variable importance, but not by means of the standard random forest variable importance (see example).
set.seed(290875)
readingSkills.cf <- cforest(score ~ ., data = readingSkills,
control = cforest_unbiased(mtry = 2, ntree = 50))
# standard importance
varimp(readingSkills.cf)
# the same modulo random variation
varimp(readingSkills.cf, pre1.0_0 = TRUE)
# conditional importance, may take a while...
varimp(readingSkills.cf, conditional = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.