Planning: Dataset for practicing cleaning, labelling and recoding
In epiDisplay: Epidemiological Data Display Package

Data for cleaning

R Documentation

Dataset for practicing cleaning, labelling and recoding

Description

The data come from clients of a family planning clinic.

For all variables except id: 9, 99, 99.9, 888, 999 represent missing values

Usage

data(Planning)

Format

A data frame with 251 observations on the following 11 variables.

ID: a numeric vector: ID code
AGE: a numeric vector
RELIG: a numeric vector: Religion

	1	= Buddhist
	2	= Muslim

PED: a numeric vector: Patient's education level

	1	= none
	2	= primary school
	3	= secondary school
	4	= high school
	5	= vocational school
	6	= university
	7	= other

INCOME: a numeric vector: Monthly income in Thai Baht

	1	= nil
	2	= < 1,000
	3	= 1,000-4,999
	4	= 5,000-9,999
	5	= 10,000

AM: a numeric vector: Age at marriage
REASON: a numeric vector: Reason for family planning

	1	= birth spacing
	2	= enough children
	3	= other

BPS: a numeric vector: systolic blood pressure
BPD: a numeric vector: diastolic blood pressure
WT: a numeric vector: weight (Kg)
HT: a numeric vector: height (cm)

Examples

data(Planning)
des(Planning)

# Change var. name to lowercase
names(Planning) <- tolower(names(Planning)) 
.data <- Planning
des(.data)
# Check for duplication of 'id'
attach(.data)
any(duplicated(id))
duplicated(id)
id[duplicated(id)] #215

# Which one(s) are missing?
setdiff(min(id):max(id), id) # 216

# Correct the wrong on
id[duplicated(id)] <- 216
detach(.data)
rm(list=ls())

epiDisplay documentation built on May 18, 2022, 5:11 p.m.