nhanes3: NHANES III data
In dardisco/LogisticDx: Diagnostic Tests for Logistic Regression Models

Description Format Details Note Source References Examples

NHANES III data

A data.frame with 17030 observations (rows) and 16 variables (columns).

A subset of data from the National Health and Nutrition Examination Study (NHANES) III. Subjects age >=20 are included.
A sample of 39,695 subjects was selected, representing more than 250 million people living in the USA. Data was collected 1988-1994.

49 pseudo strata were created with 2 pseudo-PSU's in each stratum (primary sampling units).

This is a subset of the original dataset.

Columns are:

SEQN

Respondent sequence number.

SDPPSU6

Pseudo-PSU (primary sampling unit).

SDPSTRA6

Pseudo stratum.

WTPFHX6

Statistical weight. Range 225.93 to 139744.9.

HSAGEIR

Age (years).

HSSEX

Gender (a factor):

0: female
1: male

DMARACER

Race (a factor):

1: white
2: black
3: other

BMPWTLBS

Body weight (lbs).

BMPHTIN

Standing height (inches).

PEPMNK1R

Average Systolic BP.

PEPMNK5R

Average Diastolic BP.

HAR1

Has respondent smoked >100 cigarettes in life (a factor):

1: yes
2: no

HAR3

Does respondent smoke cigarettes now? (a factor):

1: yes
2: no

SMOKE

Smoking (a factor):

1: never (HAR1 = 2)
2: >100 cigs (HAR1 = 1 & HAR3 = 2)
3: current (HAR1 =1 & HAR3 = 1)

TCP

Serum cholesterol (mg/100ml).

HBP

High blood pressure? (a factor):

1: yes (PEPMNK1R > 140)
2: no (PEPMNK1R <= 140)

Taken from:
ANALYTIC AND REPORTING GUIDELINES: The Third National Health and Nutrition Examination Survey, NHANES III (1988-94).

In the NHANES III, 89 survey locations were randomly divided into 2 sets or phases, the first consisting of 44 and the other, 45 locations. One set of primary sampling units (PSUs) was allocated to the first 3-year survey period (1988-91) and the other set to the second 3-year period (1991-94).
Therefore, unbiased national estimates of health and nutrition characteristics can be independently produced for each phase as well as for both phases combined. Computation of national estimates from both phases combined (i.e. total NHANES III) is the preferred option; individual phase estimates may be highly variable. In addition, individual phase estimates are not statistically independent.

It is also difficult to evaluate whether differences in individual phase estimates are real or due to methodological differences. That is, differences may be due to changes in sampling methods or data collection methodology over time. At this time, there is no valid statistical test for examining differences between phase 1 and phase 2.

NHANES III is based on a complex multistage probability sample design. Several aspects of the NHANES design must be taken into account in data analysis, including the sampling weights and the complex survey design. Appropriate sampling weights are needed to estimate prevalence, means, medians, and other statistics. Sampling weights are used to produce correct population estimates because each sample person does not have an equal probability of selection. The sampling weights incorporate the differential 3 probabilities of selection and include adjustments for noncoverage and nonresponse.

With the large oversampling of young children, older persons, black persons, and Mexican Americans in NHANES III, it is essential that the sampling weights be used in all analyses. Otherwise, misinterpretation of results is highly likely.

Other aspects of the design that must be taken into account in data analyses are the strata and PSU pairings from the sample design. These pairings should be used to estimate variances and test for statistical significance.

For weighted analyses, analysts can use special computer software packages that use an appropriate method for estimating variances for complex samples such as SUDAAN (Shah 1995) and WesVarPC (Westat 1996).

Although initial exploratory analyses may be performed on unweighted data with standard statistical packages assuming simple random sampling, final analyses should be done on weighted data using appropriate sampling weights.

Originally taken from H&L 2nd ed. via their publishers site at ftp://ftp.wiley.com/public/sci_tech_med/logistic

H&L 2nd ed. Page 215. Table 6.3.

National Center for Health Statistics (US) and others 1996. NHANES III reference manuals and reports. National Center for Health Statistics. CDC (free)

## use simpler column names
data("nhanes3", package="LogisticDx")
n1 <- c("ID", "pStrat", "pPSU", "sWt", "age", "sex",
        "race", "bWt", "h", "sysBP", "diasBP", "sm100",
        "smCurr", "smok", "chol", "htn")
names(nhanes3) <- n1