Description Usage Format Details NHANES warning Disclamer Study Variables Demographic Variables Physical Measurements Health Variables Lifestyle Variables Weighting Variables (NHANESraw only) Source Examples
This is survey data collected by the US National Center for Health Statistics (NCHS) which has conducted a series of health and nutrition surveys since the early 1960's. Since 1999 approximately 5,000 individuals of all ages are interviewed in their homes every year and complete the health examination component of the survey. The health examination is conducted in a mobile examination centre (MEC).
1 |
data frames with raw and resampled versions of the NHANES data. See below for details and descriptions of the varaibles.
The NHANES target population is "the non-institutionalized civilian resident population of the United States". NHANES, (American National Health and Nutrition Examination surveys), use complex survey designs (see http://www.cdc.gov/nchs/data/series/sr_02/sr02_162.pdf) that oversample certain subpopulations like racial minorities. Naive analysis of the original NHANES data can lead to mistaken conclusions. The percentages of people from each racial group in the data, for example, are quite different from the way they are in the population.
NHANES
and NHANESraw
each include 75 variables available for the 2009-2010 and 2011-2012 sample years.
NHANESraw
has 20,293 observations of these variables plus four additional
variables that describe that sample weighting scheme employed.
NHANES
contains 10,000 rows of data resampled from
NHANESraw
to undo these oversampling effects.
NHANES
can be treated, for educational purposes,
as if it were a simple random sample from the American population.
A list of the variables in the data set follows appears below along with variable descriptions and links to the original NHANES documentation.
The following warning comes directly from the NHANES web site:
For NHANES datasets, the use of sampling weights and sample design variables is recommended for all analyses because the sample design is a clustered design and incorporates differential probabilities of selection. If you fail to account for the sampling parameters, you may obtain biased estimates and overstate significance levels.
Please note that the data sets provided in this package are derived from the NHANES database and have been adapted for educational purposes. As such, they are NOT suitable for use as a research database. For research purposes you should download original data files from the NHANES website and follow the analysis instructions given there. Further details and relevant documentation can be found on the following NHANES websites
Which survey the participant participated in.
Participant identifier.
For more information on these demographic variables, see http://www.cdc.gov/nchs/nhanes/nhanes2009-2010/DEMO_F.htm or http://www.cdc.gov/nchs/nhanes/nhanes2011-2012/DEMO_G.htm.
Gender (sex) of study participant coded as male
or female
Age in years at screening of study participant. Note: Subjects 80 years or older were recorded as 80.
Categorical variable derived from age with levels 0-9
, 10-19
, ... 70+
Age in months at screening of study participant. Reported for participants aged 0 to 79 years for 2009 to 2010 data Reported for participants aged 0 to 2 years for 2011 to 2012 data.
Reported race of study participant: Mexican, Hispanic, White, Black, or Other.
Reported race of study participant, including non-Hispanic Asian category: Mexican, Hispanic, White, Black, Asian, or Other. Not availale for 2009-10.
Educational level of study participant Reported for participants aged 20 years or older.
One of 8thGrade
, 9-11thGrade
, HighSchool
, SomeCollege
, or CollegeGrad
.
Marital status of study participant. Reported for participants aged 20 years or older.
One of Married
, Widowed
, Divorced
, Separated
, NeverMarried
, or LivePartner
(living with partner).
Total annual gross income for the household in US dollars. One of
0 - 4999
, 5000 - 9,999
,
10000 - 14999
, 15000 - 19999
, 20000 - 24,999
,
25000 - 34999
, 35000 - 44999
, 45000 - 54999
, 55000 - 64999
, 65000 - 74999
,
75000 - 99999
, or 100000 or More
.
Numerical version of HHIncome
derived from the middle income in each category
A ratio of family income to poverty guidelines. Smaller numbers indicate more poverty
How many rooms are in home of study participant (counting kitchen but not bathroom). 13 rooms = 13 or more rooms.
One of Home
, Rent
, or Other
indicating whether
the home of study participant or someone in their family is owned, rented or occupied
by some other arrangement.
For more information on body measurements, see http://www.cdc.gov/nchs/nhanes/nhanes2009-2010/BMX_F.htm and http://www.cdc.gov/nchs/nhanes/nhanes2011-2012/BMX_G.htm.
Weight in kg
Recumbent length in cm. Reported for participants aged 0 - 3 years.
Head circumference in cm. Reported for participants aged 0 years (0 - 6 months).
Standing height in cm. Reported for participants aged 2 years or older.
Body mass index (weight/height2 in kg/m2). Reported for participants aged 2 years or older.
Body mass index category.
Reported for participants aged 2 to 19 years.
One of
UnderWeight
(BMI < 5th percentile)
NormWeight
(BMI 5th to < 85th percentile),
OverWeight
(BMI 85th to < 95th percentile),
Obese
(BMI >= 95th percentile).
Body mass index category.
Reported for participants aged 2 years or older.
One of 12.0_18.4
, 18.5_24.9
, 25.0_29.9
, or 30.0_plus
.
60 second pulse rate
Combined systolic blood pressure reading, following the procedure outlined for BPXSAR.
Combined diastolic blood pressure reading, following the procedure outlined for BPXDAR.
Systolic blood pressure in mm Hg – first reading
Diastolic blood pressure in mm Hg – second reading (consecutive readings)
Systolic blood pressure in mm Hg – second reading (consecutive readings)
Diastolic blood pressure in mm Hg – second reading
Systolic blood pressure in mm Hg third reading (consecutive readings)
Diastolic blood pressure in mm Hg – third reading (consecutive readings)
Testerone total (ng/dL). Reported for participants aged 6 years or older. Not available for 2009-2010.
For more information on these variables, see http://www.cdc.gov/nchs/nhanes/nhanes2009-2010/HDL_F.htm or http://www.cdc.gov/nchs/nhanes/nhanes2011-2012/HDL_G.htm.
Direct HDL cholesterol in mmol/L. Reported for participants aged 6 years or older.
Total HDL cholesterol in mmol/L. Reported for participants aged 6 years or older.
Urine volume in mL – first test. Reported for participants aged 6 years or older.
Urine flow rate (urine volume/time since last urination) in mL/min – first test. Reported for participants aged 6 years or older.
Urine volume in mL – second test. Reported for participants aged 6 years or older.
Urine flow rate (urine volume/time since last urination) in mL/min – second test. Reported for participants aged 6 years or older.
Study participant told by a doctor or health professional
that they have diabetes. Reported for participants aged 1 year or older
as Yes
or No
.
Age of study participant when first told they had diabetes. Reported for participants aged 1 year or older.
Self-reported rating of participant's health in general
Reported for participants aged 12 years or older.
One of Excellent
, Vgood
, Good
, Fair
, or Poor
.
Self-reported number of days participant's physical health was not good out of the past 30 days. Reported for participants aged 12 years or older.
Self-reported number of days participant's mental health was not good out of the past 30 days. Reported for participants aged 12 years or older.
Self-reported number of days where participant had little
interest in doing things. Reported for participants aged 18 years or older.
One of None
, Several
, Majority
(more than half the days),
or AlmostAll
.
Self-reported number of days where participant felt down,
depressed or hopeless. Reported for participants aged 18 years or older.
One of None
, Several
, Majority
(more than half the days),
or AlmostAll
.
How many times participant has been pregnant. Reported for female participants aged 20 years or older.
How many of participants deliveries resulted in live births. Reported for female participants aged 20 years or older.
Pregnancy status at the time of the health examination
was ascertained for females 8-59 years of age.
Due to disclosure risks pregnancy status was only be released for women 20-44
years of age. The information used included urine pregnancy test results
and self-reported pregnancy status. Urine pregnancy tests were performed prior
to the dual energy x-ray absorptiometry (DXA) exam.
Persons who reported they were pregnant at the time of exam were assumed to
be pregnant. As a result, if the urine test was negative, but the subject
reported they were pregnant, the status was coded as "Yes"
.
If the urine pregnancy results were negative and the respondent stated that they
were not pregnant, the respondent was coded as "No"
If the urine pregnancy
results were negative and the respondent did not know her pregnancy status,
the respondent was coded "unknown"
Persons who were interviewed,
but not examined also have a value of "unknown"
. In addition
there are missing values.
Age of participant at time of first live birth. 14 years or under = 14, 45 years or older = 45. Reported for female participants aged 20 years or older.
Self-reported number of hours study participant usually gets at night on weekdays or workdays. Reported for participants aged 16 years and older.
Participant has told a doctor or other health professional that they
had trouble sleeping. Reported for participants aged 16 years and older.
Coded as Yes
or No
.
More information about these variables is available at http://www.cdc.gov/nchs/nhanes/nhanes2009-2010/SMQ_F.htm or http://www.cdc.gov/nchs/nhanes/nhanes2011-2012/SMQ_G.htm.
Participant does moderate or vigorous-intensity sports, fitness or recreational activities (Yes or No). Reported for participants 12 years or older.
Number of days in a typical week that participant does moderate or vigorous-intensity activity. Reported for participants 12 years or older.
Number of hours per day on average participant watched TV over the
past 30 days. Reported for participants 2 years or older.
One of 0_to_1hr
, 1_hr
, 2_hr
, 3_hr
, 4_hr
, More_4_hr
.
Not available 2009-2010.
Number of hours per day on average participant used a computer or gaming
device over the past 30 days. Reported for participants 2 years or older. One of
0_hrs
, 0_to_1hr
, 1_hr
, 2_hr
, 3_hr
, 4_hr
, More_4_hr
.
Not available 2009-2010.
Number of hours per day on average participant watched TV over the past 30 days. Reported for participants 2 to 11 years. Not available 2011-2012.
Number of hours per day on average participant used a computer or gaming device over the past 30 days. Reported for participants 2 to 11 years old. Not available 2011-2012.
Participant has consumed at least 12 drinks of any type of alcoholic beverage in any one year. Reported for participants 18 years or older as Yes or No.
Average number of drinks consumed on days that participant drank alcoholic beverages. Reported for participants aged 18 years or older.
Estimated number of days over the past year that participant drank alcoholic beverages. Reported for participants aged 18 years or older.
Study participant currently smokes cigarettes regularly.
Reported for participants aged 20 years or older as Yes
or No
, provieded they
answered Yes to having somked 100 or more cigarettes in their life time. All subjects who
have not smoked 100 or more cigarettes are listed as NA
here.
Study participant has smoked at least 100 cigarettes in their entire life.
Reported for participants aged 20 years or older as Yes
or No
.
Age study participant first started to smoke cigarettes fairly regularly. Reported for participants aged 20 years or older.
Participant has tried marijuana. Reported for participants aged 18 to 59 years as
Yes
or No
.
AgeFirstMarij
Age participant first tried marijuana. Reported for participants aged 18 to 59 years.
Participant has been/is a regular marijuana user (used at least once a month for a year).
Reported for participants aged 18 to 59 years as Yes
or No
.
Age of participant when first started regularly using marijuana. Reported for participants aged 18 to 59 years.
Participant has tried cocaine, crack cocaine, heroin or methamphetamine.
Reported for participants aged 18 to 69 years as Yes
or No
.
Participant had had vaginal, anal, or oral sex.
Reported for participants aged 18 to 69 years as Yes
or No
.
Age of participant when had sex for the first time. Reported for participants aged 18 to 69 years.
Number of opposite sex partners participant has had any kind of sex with over their lifetime. Reported for participants aged 18 to 69 years.
Number of opposite sex partners participant has had any kind of sex with over the past 12 months. Reported for participants aged 18 to 59 years.
Participant has had any kind of sex with a same sex partner.
Reported for participants aged 18 to 69 years ad Yes
or No
.
participant's sexual orientation (self-described).
Reported for participants aged 18 to 59 years.
One of Heterosexual
, Homosexual
, Bisexual
.
NHANESraw
only)Sample weighting variables. For more details see one of the following.
These data were originally assembled by Michelle Dalrymple of Cashmere High School and Chris Wild of the University of Auckland, New Zealand for use in teaching statistics.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | # Due to the sampling design, some races were over/under-sampled.
rbind(
NHANES = table(NHANES$Race1) / nrow(NHANES),
NHANESraw = table(NHANESraw$Race1) / nrow(NHANESraw),
diff = (table(NHANES$Race1) - table(NHANESraw$Race1)) / nrow(NHANESraw)
)
# SmokeNow is only asked of people who answer Yes to Smoke100
if (require(mosaic)) {
nhanes <-
NHANES %>%
mutate(
SmokingStatus = derivedFactor(
Current = SmokeNow == "Yes",
Former = SmokeNow == "No",
Never = Smoke100 == "No"
)
)
tally( ~SmokingStatus, data = nhanes )
}
|
Black Hispanic Mexican White Other
NHANES 0.1197000 0.06100000 0.1015000 0.63720000 0.08060000
NHANESraw 0.2286503 0.10885527 0.1842507 0.36431282 0.11393091
diff -0.1696644 -0.07879564 -0.1342335 -0.05031292 -0.07421278
Loading required package: mosaic
Loading required package: dplyr
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
Loading required package: lattice
Loading required package: ggformula
Loading required package: ggplot2
Loading required package: ggstance
Attaching package: 'ggstance'
The following objects are masked from 'package:ggplot2':
GeomErrorbarh, geom_errorbarh
New to ggformula? Try the tutorials:
learnr::run_tutorial("introduction", package = "ggformula")
learnr::run_tutorial("refining", package = "ggformula")
Loading required package: mosaicData
Loading required package: Matrix
The 'mosaic' package masks several functions from core packages in order to add
additional features. The original behavior of these functions should not be affected by this.
Note: If you use the Matrix package, be sure to load it BEFORE loading mosaic.
Attaching package: 'mosaic'
The following object is masked from 'package:Matrix':
mean
The following object is masked from 'package:ggplot2':
stat
The following objects are masked from 'package:dplyr':
count, do, tally
The following objects are masked from 'package:stats':
IQR, binom.test, cor, cor.test, cov, fivenum, median, prop.test,
quantile, sd, t.test, var
The following objects are masked from 'package:base':
max, mean, min, prod, range, sample, sum
SmokingStatus
Current Former Never <NA>
1466 1745 4024 2765
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.