insurance: insurance data set

insuranceR Documentation

insurance data set

Description

the insurance dataset contains 7 features and 1338 records. the target feature is charge and the remaining 6 variables are predictors. This dataset is simulated on the basis of demographic statistics from the US Census Bureau.

Usage

 data(insurance) 

Format

the insurance dataset, as a data frame, contains 1338 rows (customers) and 7 columns (variables/features). the 7 variables are:

  • age: age of primary beneficiary.

  • bmi: body mass index, providing an understanding of body, weights that are relatively high or low relative to height, objective index of body weight (kg / m ^ 2) using the ratio of height to weight, ideally 18.5 to 24.9.

  • children: Number of children covered by health insurance / Number of dependents.

  • smoker: Smoking as a factor with 2 levels, yes, no.

  • gender: insurance contractor gender, female, male.

  • region: the beneficiary's residential area in the US, northeast, southeast, southwest, northwest.

  • charge: individual medical costs billed by health insurance.

Details

For more information related to the dataset see:
https://www.kaggle.com/mirichoi0218/insurance

Source

This dataset comes from:
https://github.com/stedy/Machine-Learning-with-R-datasets

References

Brett Lantz (2019). Machine Learning with R: Expert techniques for predictive modeling. Packt Publishing Ltd.

Reza Mohammadi (2025). Data Science Foundations and Machine Learning with R: From Data to Decisions. https://book-data-science-r.netlify.app.

See Also

bank, churn_mlc, churn, churn_tel, adult, risk, cereal, advertising, marketing, drug, house, house_price, red_wines, white_wines, caravan, fertilizer, corona

Examples

data(insurance)
str(insurance)

liver documentation built on Feb. 19, 2026, 1:07 a.m.