data_for_EC: Simulated Household Survey Assets Data for 'EconomicClusters'

data_for_ECR Documentation

Simulated Household Survey Assets Data for 'EconomicClusters'

Description

This data set contains simulated household survey assets data including binary and multi-level categorical asset variables and weighted number of household members. This data set is meant for use with functions 'EconomicClusters' and 'EC_time'.

Usage

data(data_for_EC)

Format

A data frame with 100 observations on the following 11 variables.

Weights

a numeric vector of weighted number of household members

V2

a binary 0/1 variable with probability 0.4 of value 1

V5

a binary 0/1 variable with probability 0.6 of value 1

V6

a binary 0/1 variable with probability 0.8 of value 1

V7

a categorical variable with the following probabilities: p(V7=1)=0.4, p(V7=2)=0.3, p(V7=3)=0.2, p(V7=4)=0.1

V8

a categorical variable with the following probabilities: p(V8=1)=0.4, p(V8=2)=0.3, p(V8=3)=0.2, p(V8=4)=0.1

V9

a categorical variable with the following probabilities: p(V9=1)=0.4, p(V9=2)=0.3, p(V9=3)=0.2, p(V9=4)=0.1. V9 is highly correlated to V11 (correlation coefficient=0.95)

V10

a categorical variable with the following probabilities: p(V10=1)=0.4, p(V10=2)=0.3, p(V10=3)=0.2, p(V10=4)=0.1

V11

a categorical variable with the following probabilities: p(V11=1)=0.4, p(V11=2)=0.3, p(V11=3)=0.2, p(V11=4)=0.1. V11 is highly correlated to V9 (correlation coefficient=0.95)

V12

a categorical variable with the following probabilities: p(V12=1)=0.4, p(V12=2)=0.3, p(V12=3)=0.2, p(V12=4)=0.1

V13

a categorical variable with the following probabilities: p(V13=1)=0.4, p(V13=2)=0.3, p(V13=3)=0.2, p(V13=4)=0.1

Details

This data set contains a sample data frame for running functions 'EconomicClusters' and 'EC_time'. Data frame 'data_for_EC' was simulated in a format similar to assets data collected in a large-scale household survey. Such data sets generally include binary variables (e.g. does your household own a cell phone?) and multi-level categorical variables (e.g. what type of water source does your household use?). In 'data_for_EC', each row represents a household, and each household's responses to the assets questions are coded as factors. Binary variables were generated using function 'rbinom' with varying probabilities. Multi-level categorical variables were generated using function 'ordsample' from package 'GenOrd'. Note, variable names are not contiguous, as we are assuming some rare assets variables were already eliminated from the data set using function 'EC_vars'. The first column represents weighted number of household members, as generated by the function 'EC_DHSwts'.

Source

This data set was simulated by the package authors in order to demonstrate the functionality of the 'EconomicClusters' package.

See Also

EC_vars, EC_DHSwts, EC_time, EconomicClusters

Examples

#We want to know how much computing time we need to run 'EconomicClusters' on our data set 
#to select 5 variables with cluster numbers ranging from 10 to 20.
#Our computer has 2 cores that we can devote to running the algorithm.

data(data_for_EC)
EC_time(data_for_EC, nvars=5, kmin=10, kmax=20, ncores=2)

#Let's say running 'EconomicClusters' on two variable combinations in parallel took 20 seconds 
#(results differ depending on your computer).
#Thus, to run all 252 combinations of variables, 
#we need 252 combinations * (20 seconds/ 2 combinations) * (1 minute / 60 seconds) 
#= 42 minutes of computing time.
#Note: To run this analysis on a full household survey data set will take much longer. 
#Don't worry! We have a free and publically available solution for you.
#Please see the help file for 'EconomicClusters-package' to find out more.

#Now, let's use 'EconomicClusters' to select 
#the 5 variables and number of clusters that best cluster our population.
#We choose to set cluster range from 10 to 20.

data(data_for_EC)
EC<-EconomicClusters(data_for_EC, nvars=5, kmin=10, kmax=20, ncores=2)
EC

#To interpret the strength of the clusters identified by 'EconomicClusters' using the ASW_max, 
#we recommend the guidelines proposed by Kaufman and Rousseeuw (2009)[3].

#To View a data frame consisting of the cluster medoids' responses to the model-defining variables:

EC$Medoid_dataframe

#To view a vector of cluster membership (as denoted by the row index for the cluster medoid) 
#for all observations in the original data set:

EC$Cluster

#To interpret what being a member of each economic cluster means, 
#look at the distribution of all assets in each population cluster. 
#We recommend interpreting the clusters in conjunction 
#with researchers familiar with the economic context 
#in the country whose data you are using,
#if you are not personally familiar with this context.

#We have now defined our economic model. 
#To assign new patients to economic clusters based on this model, 
#use the function 'EC_patient'.

Lauren-Eyler/EconomicClusters documentation built on March 22, 2022, 1:21 a.m.