In lindsayevanslee/whomds: Calculate Results from WHO Model Disability Survey Data

library(knitr)
library(tidyr)
library(dplyr)
library(whomds)

opts_chunk$set(warning=FALSE, 
               message=FALSE, 
               eval=FALSE, 
               out.width = "80%",
               fig.align = "center",
               collapse = TRUE,
               comment = "#>",
               survey.lonely.psu = "adjust")

Joining scores with original data

After you have finished with Rasch Analysis, the score is outputted in the file Data_final.csv in the column called rescaled. This file will only contain the individuals included in the analysis. Any individual who had too many missing values (NA) will not be in this file. It is often advisable to merge the original data with all individuals with the new scores. Any individual who did not have a score calculated will have an NA in this column.

This merge can be accomplished with the following code. First, open the library called tidyverse to access the necessary functions. Next, read in the Data_final.csv file and select only the columns you need: ID (or whatever the name of the individual ID column is in your data) and rescaled. The code below assumes that the file is in your working directory. You will have to include the full path to the file if it is not currently in your working directory. Finally, you can create an object merged_data that merges your original data, here represented with the object original_data, with the new score in a column renamed to "DisabilityScore" with the following code:

library(tidyverse)
new_score <- read_csv("Data_final.csv") %>% 
  select(c("ID", "rescaled"))
merged_data <- original_data %>% 
  left_join(new_score) %>% 
  rename("DisabilityScore" = "rescaled")

The sample data included in the whomds package called df_adults already has a Rasch score merged with it, in the column disability_score.

Descriptive analysis

After calculating the disability scores using Rasch Analysis, you are now ready to analyze the results of the survey by calculating descriptive statistics. The whomds package contains functions to create tables and figures of descriptive statistics. This section will go over these functions.

Tables

Descriptive statistics functions included in the whomds package are:

table_weightedpct() - produces weighted tables of N or %
table_unweightedpctn() - produces unweighted tables of N and %
table_basicstats() - computes basic statistics of the number of members per group per household.

The arguments of each of these codes will be described below.

`table_weightedpct()`

whomds contains a function called table_weightedpct() which calculates weighted results tables from the survey, disaggregated by specified variables. The arguments of this function are passed to functions in the package dplyr.

Below are the arguments of the function:

df - the data frame with all the variables of interest
vars_ids - variable names of the survey cluster ids
vars_strata - variable names of the survey strata
vars_weights - variable names of the weights
formula_vars - vector of the column names of variables you would like to print results for
... - captures expressions for filtering or transmuting the data. See the description of the argument willfilter below for more details
formula_vars_levels - numeric vector of the factor levels of the variables in formula_vars. By default, the function assumes the variables have two levels: 0 and 1
by_vars - the variables to disaggregate by
pct - a logical variable indicating whether or not to calculate weighted percentages. Default is TRUE for weighted percentages. Set to FALSE for weighted N.
willfilter - a variable that tells the function whether or not to filter the data by a particular value.
- For example, if your formula_vars have response options of 0 and 1 but you only want to show the values for 1, then you would say willfilter = TRUE. Then at the end of your argument list you write an expression for the filter. In this case, you would say resp==1.
- If you set willfilter = FALSE, then the function will assume you want to "transmute" the data, in other words manipulate the columns in some way, which for us often means to collapse response options. For example, if your formula_vars have 5 response options, but you only want to show results for the sum of options "Agree" and "StronglyAgree", (after setting spread_key="resp" to spread the table by the response options) you could set willfilter=FALSE, and then directly after write the expression for the transmutation, giving it a new column name--in this case the expression would be NewColName=Agree+AgreeStrongly. Also write the names of the other columns you would like to keep in the final table.
- If you leave willfilter as its default of NULL, then the function will not filter or transmute data.
add_totals - a logical variable determining whether to create total rows or columns (as appropriate) that demonstrate the margin that sums to 100. Keep as the default FALSE to not include totals.
spread_key - the variable to spread the table horizontally by. Keep as the default NULL to not spread the table horizontally.
spread_value - the variable to fill the table with after a horizontal spread. By default this argument is "prop", which is a value created internally by the function, and generally does not need to be changed.
arrange_vars - the list of variables to arrange the table by. Keep as default NULL to leave the arrangement as is.
include_SE - a logical variable indicating whether to include the standard errors in the table. Keep as the default FALSE to not include standard errors. As of this version of whomds, does not work when adding totals (add_totals is TRUE), spreading (spread_key is not NULL) or transmutting (willfilter is FALSE).

Here are some examples of how table_weightedpct() would be used in practice. Not all arguments are explicitly set in each example, which means they are kept as their default values.

Example 1: long table, one level of disaggregation

Let's say we want to print a table of the percentage of people in each disability level who gave each response option for a set of questions about the general environment. We would set the arguments of table_weightedpct() like this, and the first few rows of the table would look like this:

#Remove NAs from column used for argument by_vars
df_adults_noNA <- df_adults %>% 
  filter(!is.na(disability_cat))

table_weightedpct(
  df = df_adults_noNA,
  vars_ids = "PSU",
  vars_strata = "strata",
  vars_weights = "weight",
  formula_vars = paste0("EF", 1:12),
  formula_vars_levels = 1:5,
  by_vars = "disability_cat",
  spread_key = NULL,
  spread_value = "prop",
  arrange_vars = NULL,
  willfilter = NULL
  )

The outputted table has 4 columns: the variable we disaggregated the data by (disability_cat, in other words the disability level), the item (item), the response option (resp), and the proportion (prop).

Example 2: wide table, one level of disaggregation

This long table from the above example is great for data analysis, but not great for reading with the bare eye. If we want to make it nicer, we convert it to "wide format" by "spreading" by a particular variable. Perhaps we want to spread by disability_cat. Our call to table_weightedpct() would now look like this, and the outputted table would be:

table_weightedpct(
  df = df_adults_noNA,
  vars_ids = "PSU",
  vars_strata = "strata",
  vars_weights = "weight",
  formula_vars = paste0("EF", 1:12),
  formula_vars_levels = 1:5,
  by_vars = "disability_cat",
  spread_key = "disability_cat",
  spread_value = "prop",
  arrange_vars = NULL,
  willfilter = NULL
  )

Now we can see our prop column has been spread horizontally for each level of disability_cat.

Example 3: wide table, one level of disaggregation, filtered

Perhaps, though, we are only interested in the proportions of the most extreme response option of 5. We could now add a filter to our call to table_weightedpct() like so:

table_weightedpct(
  df = df_adults_noNA,
  vars_ids = "PSU",
  vars_strata = "strata",
  vars_weights = "weight",
  formula_vars = paste0("EF", 1:12),
  formula_vars_levels = 1:5,
  by_vars = "disability_cat",
  spread_key = "disability_cat",
  spread_value = "prop",
  arrange_vars = NULL,
  willfilter = TRUE,
  resp == 5
  )

Now you can see only the proportions for the response option of 5 are given.

Example 4: wide table, multiple levels of disaggregation, filtered

With table_weightedpct(), we can also add more levels of disaggregation by editing the argument by_vars. Here we will produce the same table as in Example 3 above but now disaggregated by disability level and sex:

table_weightedpct(
  df = df_adults_noNA,
  vars_ids = "PSU",
  vars_strata = "strata",
  vars_weights = "weight",
  formula_vars = paste0("EF", 1:12),
  formula_vars_levels = 1:5,
  by_vars = c("disability_cat", "sex"),
  spread_key = "disability_cat",
  spread_value = "prop",
  arrange_vars = NULL,
  willfilter = TRUE,
  resp == 5
  )

Example 5: wide table, multiple levels of disaggregation, transmuted

Perhaps we are still interested not only in response option 5, but the sum of 4 and 5 together. We can do this by "transmuting" our table. To do this, we first choose to "spread" by resp by setting spread_key="resp". This will convert the table to a wide format as in Example 2, but now each column will represent a response option. Then we set the transmutation by setting willfilter=FALSE, and adding expressions for the transmutation on the next line. We name all the columns we would like to keep and give an expression for how to create the new column of the sum of proportions for response options 4 and 5, here called problems:

table_weightedpct(
  df = df_adults_noNA,
  vars_ids = "PSU",
  vars_strata = "strata",
  vars_weights = "weight",
  formula_vars = paste0("EF", 1:12),
  formula_vars_levels = 1:5,
  by_vars = c("disability_cat", "sex"),
  spread_key = "resp",
  spread_value = "prop",
  arrange_vars = NULL,
  willfilter = FALSE,
  disability_cat, sex, item, problems = `4`+`5`
  )

If we would like to modify the table again so that disability_cat represents the columns again, we can feed this table into another function that will perform the pivot The function to pivot tables is called pivot_wider(), and it is in the tidyr package. To perform a second pivot, write the code like this:

table_weightedpct(
  df = df_adults_noNA,
  vars_ids = "PSU",
  vars_strata = "strata",
  vars_weights = "weight",
  formula_vars = paste0("EF", 1:12),
  formula_vars_levels = 1:5,
  by_vars = c("disability_cat", "sex"),
  spread_key = "resp",
  spread_value = "prop",
  arrange_vars = NULL,
  willfilter = FALSE,
  disability_cat, sex, item, problems = `4`+`5`
  ) %>% 
    pivot_wider(names_from = disability_cat, values_from = problems)

The names_from argument of the function pivot_wider() tells R which variable to use as the columns, and values_from tells R what to fill the columns with. The operator %>% is commonly referred to as a "pipe". It feeds the object before it into the first argument of the function after it. For example, if you have an object x and a function f, writing x %>% f() would be the equivalent as writing f(x). People use "pipes" because they make long sequences of code easier to read.

`table_unweightedpctn()`

whomds contains a function called table_unweightedpctn() that produces unweighted tables of N and %. This is generally used for demographic tables. Its arguments are as follows:

df - the data frame with all the variables of interest
vars_demo - vector with the names of the demographic variables for which the N and % will be calculated
group_by_var - name of the variable in which the statistics should be stratified (e.g. "disability_cat")
spread_by_group_by_var - logical determining whether to spread the table by the variable given in group_by_var. Default is FALSE.
group_by_var_sums_to_100 - logical determining whether percentages sum to 100 along the margin of group_by_var, if applicable. Default is FALSE.
add_totals - a logical variable determining whether to create total rows or columns (as appropriate) that demonstrate the margin that sums to 100. Keep as the default FALSE to not include totals.

Here is an example of how it is used:

table_unweightedpctn(df_adults_noNA, 
                     vars_demo = c("sex", "age_cat", "work_cat", "edu_cat"), 
                     group_by_var = "disability_cat", 
                     spread_by_group_by_var = TRUE)

`table_basicstats()`

The function table_basicstats() computes basic statistics of the number of member per group per household. Its arguments are:

df - a data frame of household data where the rows represent members of the households in the sample
hh_id - string (length 1) indicating the name of the variable in df uniquely identifying households
group_by_var - string (length 1) with name of variable in df to group results by

Here is an example of how it is used:

table_basicstats(df_adults_noNA, "HHID", "age_cat")

Figures

Descriptive statistics figure functions included in the whomds package are:

fig_poppyramid() - produces a population pyramid figure for the sample
fig_dist() - produces a plot of the distribution of a score
fig_density() - produces a plot of the density of a score

The arguments of each of these codes will be described below.

`fig_poppyramid()`

whomds contains a function called fig_poppyramid() that produces a population pyramid figure for the sample. This function takes as arguments:

df - the data where each row is a member of the household from the household roster
var_age - the name of the column in df with the persons' ages
var_sex - the name of the column in df with he persons' sexes
x_axis - a string indicating whether to use absolute numbers or sample percentage on the x-axis. Choices are "n" (default) or "pct".
age_plus - a numeric value indicating the age that is the first value of the oldest age group. Default is 100, for the last age group to be 100+
age_by - a numeric value indicating the width of each age group, in years. Default is 5.

Running this function produces a figure like the one below:

include_graphics("Images/pop_pyramid.png")

`fig_dist()`

whomds contains a function called fig_dist() that produces a plot of the distribution of a score. WHO uses this function to show the distribution of the disability scores calculated with Rasch Analysis. Its arguments are:

df - data frame with the score of interest
score - character variable of score variable name ranging from 0 to 100; ex. "disability_score"
score_cat - character variable of score categorization variable name, ex. "disability_cat"
cutoffs - a numeric vector of the cut-offs for the score categorization
x_lab - a string giving the x-axis label. Default is "Score"
y_max - maximum value to use on the y-axis. If left as the default NULL, the function will calculate a suitable maximum automatically.
pcent - logical variable indicating whether to use percent on the y-axis or frequency. Leave as default FALSE for frequency and give TRUE for percent.
pal - a string specifying the type of color palette to use, passed to the function RColorBrewer::brewer.pal(). Default is "Blues".
binwidth - a numeric value giving the width of the bins in the histograph. Default is 5.

Running this function produces a figure like the one below.

include_graphics("Images/distribution.png")

`fig_density()`

whomds contains a function similar to fig_dist() called fig_density() that produces a plot of the density of a score. WHO uses this function to show the density distribution of the disability scores calculated with Rasch Analysis. Its arguments are:

df - data frame with the score of interest
score - character variable of score variable name ranging from 0 to 100; ex. "disability_score"
var_color - a character variable of the column name to set color of density lines by. Use this variable if you could like to print the densities of different groups onto the same plot. Default is NULL.
var_facet - a character variable of the column name for the variable to create a ggplot2::facet_grid() with, which will plot densities of different groups in side-by-side plots. Default is NULL.
cutoffs - a numeric vector of the cut-offs for the score categorization
x_lab - a string giving the x-axis label. Default is "Score"
pal - a string specifying either a manual color to use for the color aesthetic, a character vector explictly specifying the colors to use for the color scale, or as the name of a palette to pass to RColorBrewer::brewer.pal() with the name of the color palette to use for the color scale. Default is "Paired"
adjust - a numeric value to pass to adjust argument of ggplot2::geom_density(), which controls smoothing of the density function. Default is 2.
size - a numeric value to pass to size argument of ggplot2::geom_density(), which controls the thickness of the lines. Default is 1.5.

Running this function produces a figure like the one below.

include_graphics("Images/density.png")

Descriptive statistics templates

WHO also provides a template for calculating many descriptive statistics tables for use in survey reports, also written in R. If you would like a template for your country, please contact us (see DESCRIPTION for contact info).

lindsayevanslee/whomds documentation built on Sept. 9, 2023, 10:54 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

lindsayevanslee/whomds
Calculate Results from WHO Model Disability Survey Data

In lindsayevanslee/whomds: Calculate Results from WHO Model Disability Survey Data

Joining scores with original data

Descriptive analysis

Tables

`table_weightedpct()`

Example 1: long table, one level of disaggregation

Example 2: wide table, one level of disaggregation

Example 3: wide table, one level of disaggregation, filtered

Example 4: wide table, multiple levels of disaggregation, filtered

Example 5: wide table, multiple levels of disaggregation, transmuted

`table_unweightedpctn()`

`table_basicstats()`

Figures

`fig_poppyramid()`

`fig_dist()`

`fig_density()`

Descriptive statistics templates

R Package Documentation

Browse R Packages

We want your feedback!

lindsayevanslee/whomds Calculate Results from WHO Model Disability Survey Data

In lindsayevanslee/whomds: Calculate Results from WHO Model Disability Survey Data

Joining scores with original data

Descriptive analysis

Tables

table_weightedpct()

Example 1: long table, one level of disaggregation

Example 2: wide table, one level of disaggregation

Example 3: wide table, one level of disaggregation, filtered

Example 4: wide table, multiple levels of disaggregation, filtered

Example 5: wide table, multiple levels of disaggregation, transmuted

table_unweightedpctn()

table_basicstats()

Figures

fig_poppyramid()

fig_dist()

fig_density()

Descriptive statistics templates

R Package Documentation

Browse R Packages

We want your feedback!

lindsayevanslee/whomds
Calculate Results from WHO Model Disability Survey Data

`table_weightedpct()`

`table_unweightedpctn()`

`table_basicstats()`

`fig_poppyramid()`

`fig_dist()`

`fig_density()`