createCensusIncome: Census Income prediction (Adult) dataset

Description Usage Arguments Details Value References See Also

Description

Task: Classification: formula(Income~.-fnlwgt)

Usage

1
2
createCensusIncome(file = getfilepath("censusincome.rds"), write = TRUE,
  read = TRUE)

Arguments

file

character; path/filename to write data file to

write

logical; should the dataset be written to disk for later use? (default: TRUE)

read

logical; should we try to read the dataset from the specified location first? (default: TRUE)

Details

From UCI: "Description of fnlwgt (final weight) The weights on the CPS files are controlled to independent estimates of thecivilian noninstitutional population of the US. These are prepared monthlyfor us by Population Division here at the Census Bureau. We use 3 sets ofcontrols.These are: 1. A single cell estimate of the population 16+ for each state.2. Controls for Hispanic Origin by age and sex. 3. Controls by Race, age and sex.We use all three sets of controls in our weighting program and "rake" throughthem 6 times so that by the end we come back to all the controls we used.The term estimate refers to population totals derived from CPS by creating"weighted tallies" of any specified socio-economic characteristics of thepopulation.People with similar demographic characteristics should havesimilar weights. There is one important caveat to rememberabout this statement. That is that since the CPS sample isactually a collection of 51 state samples, each with its ownprobability of selection, the statement only applies withinstate."

Value

The dataset as a data.table

References

Kohavi, Ron. "Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid." KDD. 1996.

See Also

https://archive.ics.uci.edu/ml/datasets/Census+Income


jkrijthe/createdatasets documentation built on May 19, 2019, 12:44 p.m.